AnsweredAssumed Answered

How to efficiently split docs via custom script?

Question asked by Adam Arrowsmith Employee on May 26, 2015
Latest reply on May 26, 2015 by James Ahlborn

I have a scenario in which I need to split a document into multiple chunks based on artibrary byte size (e.g. 4MB per chunk). I have a working script that does this however my concern is memory usage if I need to store the entire chunk in a byte array in order to create a new InputStream to store in the dataContext.

 

Any suggestions for a better/more memory efficient way to handle, either through Java or Atom utility libraries? Conceptually I would think to iterate through the incoming stream in reasonable chunks, writing to some OutputStream, but then how to get it back to an InputStream? Thanks!

import java.util.Properties; import java.util.ArrayList; import java.io.InputStream; import com.boomi.execution.ExecutionUtil; import com.boomi.execution.ExecutionManager;  logger = ExecutionManager.getCurrent().getBaseLogger();  //private static final int BlockSize = 4 * 1024 * 1024;  private static final int BlockSize = 500;  for( int i = 0; i < dataContext.getDataCount(); i++ ) {   InputStream is = dataContext.getStream(i);   Properties props = dataContext.getProperties(i);    byte[] buffer = new byte[BlockSize];   logger.info("Buffer.Length="+buffer.length);      int bytesRead = 0;   int blockNumber = 0;   int totalBytes = 0;   String blockId = "";     while ((bytesRead = is.read(buffer, 0, BlockSize)) != -1) // << CONCERN:  read entire blocksize into byte[] buffer   {     logger.info("blockNumber="+blockNumber);       logger.info("totalBytesBefore="+totalBytes);      totalBytes = totalBytes + bytesRead;     logger.info("totalBytesAfter="+totalBytes);       if (bytesRead < BlockSize) {       buffer = Arrays.copyOf(buffer, bytesRead);     }       out = new ByteArrayInputStream(buffer);     dataContext.storeStream(out, props);            buffer = new byte[BlockSize];     blockNumber++;    }     }


Outcomes