We have a process which uses JMS messages as drivers for processing files. The messages contain meta data about the files, including the location on the filesystem to read for processing. A Disk Get connector will be used to read the files, where we set Directory and File Filter to be the exact values of the file to read. On the connector, we also have the following set:
- File Matching Type: Wildcards
- Max Files to Read: 1
We have run into situations in which the number of files in the staging directory gets large, greater than 1,000. When this happens, the Disk connector takes progressively longer and longer to read each file. For example, with a low number of files in the directory the connector will take milliseconds to read. If there are 6,000 it will take 3-4 seconds, and so on.
The only way for us to catch up is to stop staging new files, causing backups somewhere else.
I presume this filtering is due to the Java API reading the files, and is scanning the entire directory listing. One resolution we are looking at is to create additional sub-directories (based on timestamp). This would help reduce the number of files which could be in any single directory.
Has anyone else experienced this or have ideas? I'm not sure if changing the File Matching Type to Regular Expression would have an impact. I don't think so.
One other thing to note is they we are using Java 7. I believe there are some enhancements in Java 8. Does the Disk connector make use of those?