This article discusses how to size your file system disk space and index node (inode) limit in Linux to anticipate your Atom file system usage.
Disk space is a very important factor to consider when sizing a local Atom, Molecule, or Atom Cloud.
The AtomSphere User Guide discusses disk space in several areas, especially when considering anticipated data volume: Atom, Molecule,and Atom Cloud setup.
Dell Boomi provides general guidance for minimum hardware requirements at this link: Atom system requirements (excerpts copied below as of July 18, 2016).
Minimum hardware requirements
A single Atom, Molecule node, or Cloud Molecule within an Atom Cloud can run on hardware ranging from business-class workstations to dedicated servers. You should allocate a minimum of 50 MB of hard disk space for run-time and configuration, and 10 GB for data archiving.
Minimum hardware requirements for high volumes of data
A single Atom, Molecule node, or Cloud Molecule within an Atom Cloud that must process high volumes of data - approximately 100,000 records per hour, 100 requests per minute, or files larger than 2 GB is considered “high volume” - should allocate a minimum of 100–200 GB of hard disk space.
This article describes additional considerations for sizing file system disk space for your Atom, Molecule, or Atom Cloud, and also addresses recommendations for setting Inode limits in Linux (for Atoms, Molecules, or Atom Clouds installed on Linux). For simplicity, the terminology for an Atom is used; the discussion applies to Molecules and Atom Clouds as well, unless specifically noted.
Before we discuss how to size and configure file system disk space, let's review the most common symptom of an improperly sized disk and some ways to reduce the file usage of the Dell Boomi processes that you develop.
Most common symptom of improperly sized disk space
The most common symptom of an improperly sized disk is the No Space Left on Device error. This error is not always readily visible. For example, the error might be embedded in a process or container log such as this:
Severe errors occurred during start shape execution, terminating process. (com.boomi.process.ProcessException) Caused by: Unable to create new index segment. ((com.boomi.store.DataStoreException)) Caused by: Error storing data, file not found. ((com.boomi.store.DataStoreException)) Caused by: /.../Boomi_AtomSphere/Atom/Atom_Name/data/2016.7.18/123456789.dat.pmeta.0 (No space left on device) ((java.io.FileNotFoundException))
This is just one example of the error. The same error can occur for other read or write file operations on other types of files as well.
Ways to reduce file usage in the processes you develop
There are multiple ways to reduce file system usage during the development phase to prevent the No Space Left on Device error. Each of these methods may or may not be applicable to your use case, but each should be reviewed and considered:
- Increase purging levels to minimize Atom disk space. Keep only the data that you need: Setting the purge schedule
- Incorporate a separate disk for archiving data that is needed long term: Connector operation's Archiving tab or Archive File Location and Name
- Configure working data storage (specific to a Molecule or Atom Cloud): Molecule and Atom Cloud working data storage
- Run your process in Low Latency mode: Low latency processes
- Change the directory nesting level of both the data and the execution directories: Atom Data Directory Structure and Process Execution Directory Structure
- For best practices in process development, we recommend conducting a design review with the Dell Boomi Professional Services team. They can provide common patterns and help you implement an efficient process design to achieve high volume execution while being able to still track the data.
In addition to the above considerations, you should do some planning and testing up front to estimate how much disk space you will need. The following sections offer some ideas on how to approach that planning.
How to do capacity planning with a varied number of processes and different file system utilizations
Every Dell Boomi process shape writes files to disk when the process is not executed in low latency mode. Experience has shown that some shapes may create more files in the system than others (e.g. Flow Control, Connectors). The configuration and order of shapes in each process can also affect the number of files that are created and used in the the Atom installation directory file system.
Therefore, it can be difficult to determine how many files will be created by the Atom in the file system and, for Linux, how many inodes will be used.
For capacity planning, it is always recommended that you lean toward the higher end of our minimum Atom system requirements. You should even consider doubling the amount of disk space you think you need to process at a minimum, just to be safe.
It is also recommended that you perform some testing around capacity planning. For example, here are some sample tests to consider:
For overall disk usage testing
First, select a command or tool to monitor the Atom installation disk storage location. Then, install and start up a new Atom, Molecule, or Atom Cloud on a new disk.
Here is an example of how to test disk usage. Remember, this is only an example; you might have different requirements and, therefore, different disk space numbers and different test cases.
- Initially, we allocate 50 GB of disk space for the Atom installation.
- Before running anything on the Atom, we run the command/tool to monitor disk space usage. We see that it has about 50 MB of disk space in use, which is about 1% utilization.
- Prepare a test. For this example, let's select a use case where we anticipate having 5000 executions of a listener process in General mode over 2 hours. (For more information on General mode, refer to: Process Options dialog and Performance troubleshooting of General mode processes.)
- Execute the test with the process in General mode. It should generate about 40 requests per minute on average.
- Run your command/tool to monitor Atom installation disk utilization after the 2-hour test. Atom installation disk usage should increase after the test. For this example, let's suppose that we observe 3% utilization after the test. That's about 1.5 GB of disk space used.
- Calculate your estimation over a longer period of time. For example, for this process to run at this rate for 1 day, the estimated usage would be 1.5 GB x 12 hours = 18 GB in a day. Therefore, taking into account strictly disk space (not inode usage yet), you can roughly estimate that the 50 GB of disk space would be used up in about 2.5 days.
- Determine how much data retention you require under the Atom. For example, if you want to keep 7 days of data under the Atom (purge every 7 days), a minimum of 125 GB would be needed (but more might be recommended). Even though this test is below our criteria for "high volume," it still fell within our minimum recommendation for high volume. Therefore, minimum recommendations are just that - minimums. You will likely need to go higher depending on your requirements.
Keep in mind that our minimum Atom system requirements state that if you need to process approximately 100,000 records per hour, or receive 100 requests per minute, or process files larger than 2 GB, that would be considered “high volume”. The minimum recommendation for high volume is 100-200 GB, but sizing your disk space for high volume can vary depending on several factors, including but not limited to:
- the complexity of the process design
- the size of your data
- the number of documents per execution
- the frequency of the executions in a given period of time
- how long you want to keep your data under the atom (your desired data retention period or purge setting).
Therefore, estimation and testing are both recommended!
After you measure the needed disk space for a particular process, the amount could change if more processes are added. So you should consider your full suite of processes when estimating and testing your disk space usage. Determine the minimum requirements and worst case scenario for each process. Then consider ways to reduce disk space usage (as mentioned above) or add more disk space, if appropriate.
There is a possibility that you could do all of the above, but still get the No Space Left on Device error... The above analysis does not take into account inode usage, which could be another factor on Linux systems.
For Linux inode usage testing
On Linux, you might get the No Space Left on Device error while your disk space still shows very low utilization. In that case, it could mean you have run out of inodes (The Inode Object).
The Dell Boomi Atom creates a number of sub-directories and files per execution, especially when a process is run in General mode. Each file uses inodes, which will in turn use disk space. Scenarios can occur where the inode usage rate increases faster than the disk space usage rate.
Inode structure has limited space and can be filled faster if many very small files are created in the system. The number of inodes (and files) that can be created in a directory has default limits when the file system is set up (Linux - What is the maximum number of files a file system can contain? - Server Fault).
For an extreme example, if one large file of 50 GB is created in a file system where 50 GB of disk space is free, only 1 inode is created but the disk is still full. At the other extreme, if 64,000 small files of 2 bytes each are created in the file system, you might consume only .001 GB of disk space but reach your inode limit and get the No Space Left on Device error. It's a generic error, but one that could mean you've reached the inode limit that was set at the file system level.
If you have a Linux-based Atom, you need to ensure that you have enough inodes on the Linux-based file system that is hosting your Atom.
The Dell Boomi Atom creates many small files when processes are run in General mode to store execution metadata (in addition to actual data, depending on your purge settings), so that the data can be retrieved and displayed in Process Reporting. The metadata files can be numerous but small in size. Expect something in between the two extreme examples above . As these many small files are created, so are the inodes that go with them. Therefore, the inode structure could use its allocated disk space more quickly than the actual data stored.
Before we describe how to properly set your inode limit, here are some tools you can use to check and verify whether the Linux file system is running out of inodes.
On Linux, you can use the df command to get a full summary of disk space and inode usage on the system:
- To display file system disk space statistics in “human-readable” format (provides details in bytes, mega bytes, and gigabytes)
Command: df -h
- To display the file system type of your system along with other information
Command : df -T
- To display information on the number and percentage of used inodes for the file system
Command: df -i
For example, running the last command could result in the following output:
df -i / Filesystem Inodes IUsed IFree IUse% Mounted on /dev/ 884736 884736 0 100% /
The last value, 100%, means all inodes have been used.
A quick search on Google shows that running out of inodes is a fairly common issue. Here are two examples describing this issue on Linux file systems:
As a temporary solution, you can free inode storage by purging files and more disk space will become available. However, adding more disk space might not address the issue because your inode limit could still be set too low.
Another easy alternative to consider is low latency mode. If a process is run in low latency mode, it does not store its execution history and data in Process Reporting (unless there are errors). Instead, the process tracks a higher level of success rate in the AtomSphere Real-time Dashboard, which has a smaller inode footprint.
Still, you might need to increase your inode limit!
Here is another example to help you determine what limit to set:
- Let's say your inode allocation for your root disk space is set at 2 GB.
- At the start of a test similar to the one above, use one or more of the df commands to measure inode utilization (in addition to measuring disk space). For this example, less than 1% was used (only 300 KB) at the start of the test.
- After your test completes in 2 hours, measure your inode utilization again. For this example, let's say after testing we find that about 5% of inodes (or 90 MB) were used.
- Calculate your inode usage forward. For example, if this process were to complete in 24 hours (1 day), it would use 12 x 5% = 60% of your inode allocation. This might not be sufficient. You might need to increase your maximum inode disk space allocation. Reference: Files - How can I increase the number of inodes in an ext4 filesystem? - Unix & Linux Stack Exchange.
When sizing the disk space for your on-premise Dell Boomi Atom, Molecule, or Atom Cloud, there are several considerations to be made, including but not limited to:
- Review the AtomSphere User Guide and the recommendations in this article to determine what configurations are best for your requirements.
- Allocate sufficient disk space by estimating, testing, and tuning where appropriate.
- Specify an appropriate inode allocation (if using an Atom on Linux) by considering if default settings are sufficient, testing and estimating, and testing again.
- Consider Low Latency mode for high volume processes, unless historical data for successful executions is absolutely needed.
- Conduct process design reviews to improve efficiency when achieving high volume execution while still being able to track the data.