AnsweredAssumed Answered

How to efficiently split docs via custom script?

Question asked by Adam Arrowsmith Employee on May 26, 2015
Latest reply on May 26, 2015 by James Ahlborn

I have a scenario in which I need to split a document into multiple chunks based on artibrary byte size (e.g. 4MB per chunk). I have a working script that does this however my concern is memory usage if I need to store the entire chunk in a byte array in order to create a new InputStream to store in the dataContext.


Any suggestions for a better/more memory efficient way to handle, either through Java or Atom utility libraries? Conceptually I would think to iterate through the incoming stream in reasonable chunks, writing to some OutputStream, but then how to get it back to an InputStream? Thanks!

import java.util.Properties; import java.util.ArrayList; import; import com.boomi.execution.ExecutionUtil; import com.boomi.execution.ExecutionManager;  logger = ExecutionManager.getCurrent().getBaseLogger();  //private static final int BlockSize = 4 * 1024 * 1024;  private static final int BlockSize = 500;  for( int i = 0; i < dataContext.getDataCount(); i++ ) {   InputStream is = dataContext.getStream(i);   Properties props = dataContext.getProperties(i);    byte[] buffer = new byte[BlockSize];"Buffer.Length="+buffer.length);      int bytesRead = 0;   int blockNumber = 0;   int totalBytes = 0;   String blockId = "";     while ((bytesRead =, 0, BlockSize)) != -1) // << CONCERN:  read entire blocksize into byte[] buffer   {"blockNumber="+blockNumber);"totalBytesBefore="+totalBytes);      totalBytes = totalBytes + bytesRead;"totalBytesAfter="+totalBytes);       if (bytesRead < BlockSize) {       buffer = Arrays.copyOf(buffer, bytesRead);     }       out = new ByteArrayInputStream(buffer);     dataContext.storeStream(out, props);            buffer = new byte[BlockSize];     blockNumber++;    }     }