AnsweredAssumed Answered

A suitable molecule configuration for a large enterprise

Question asked by SatyaKomatineni3761 on Jun 4, 2016
Latest reply on Jul 29, 2016 by Srinivas Chandrakanth Vangari


These are just my notes based on what I know so far in the last few months. So please get them  validated by your respective Boomi architect before believing any of it.


This question especially pertains to answering how to configure molecules for ETL jobs for as many as 10 to 15 independent projects.


How may nodes you need in the molecule?

How many molecules you need?

How much should be the memory for each node?

Should it be forked?

what is the amount of forked memory?


Expected load


1. 10 to 15 projects in a year

2. 5 to 10 large ETL processes per project

3. 10 to 15 Parallel nightly processes that need to finish 2 to 4 hours


Molecule or Private cloud


I haven't explored the option of private cloud due to our time constraints as we have started with a molecule when we got going. So this note will only answer the molecule question.


Likely setup we will go with


1. 4 nodes (Linux)

2. Forked mode with 1G of RAM per process

3. 32 G of RAM per machine (2 for OS + 1G for main node + 29 work units/processes with 1G each available per node)




1. A flow control can break work based on "processes"

2. Say break work by 10 ways.

3. In this configuration main process will be spawned off from one of nodes with 1G of RAM

4. Now the 10 work units will look for hosts. 4 will go to 4 nodes with their own 1G process. The other 6 will become additional JVM processes on those same 4 nodes. So 2 JVM processes per node and one node will carry 4 JVM processes

5. A flow control can also break work based on "threads" in addition. They will run in each of the broken down JVM as more work units

6. Keeping the forked memory to small "1G" is useful because a process developer can choose to take more processes or less processes based on the memory needs. (it may even be good to have 512MB as a developer can always choose more processes to get more memory). However the main process may need more memory to consolidate.

7. we also chose forked mode because we don't want one process to adversely impact another process and bring down the load


Real time vs ETL


1. Here we chose to separate real time load from ETL load as we don't want one to impact the other

2. For the real time we happened to chose atoms as the load carriers. We may have chosen molecules for that as well. Not sure where we will end up in the next iteration on whether to stick with atoms or molecules for real time load


Significant find


1. Having forked execution gives 2 benefits

2. Developers can scale up and scale down memory using flow control and processes

3. Secondly the process isolation gives more confidence to managers that one team is not blaming another


We are very early


I am still very early in fully bearing the results out


I just want to share out what I came to know, right or wrong.


Test the configurations for yourself, and hope this helps a bit in your search.


Unknowns for us that need more research


1. is a private cloud better?

2. Even for real time are we better of molecule or private clouds?

3. How much throughput are we able to carry through this current setup?