How to remove the first line of data using Groovy

Document created by Adam Arrowsmith Employee on Oct 26, 2011
Version 1Show Document
  • View in full screen mode
You need to strip out the first line of a flat file or CSV document (e.g. the column headers) to simplify processing later in the Process.

Note: The need for this script is all but eliminated with the new AtomSphere Split and Combine options to retain and consolidate column headers using native AtomSphere functionality.
Use a Data Process step with Custom Scripting. Replace the sample script with that below.

Code:

//This script strips the first line out of each document and outputs the rest of the document contents unaltered.

newline = System.getProperty("line.separator");


for ( int i = 0; i < dataContext.getDataCount(); i++ ) {
       InputStream is = dataContext.getStream(i);
       Properties props = dataContext.getProperties(i);


       reader = new BufferedReader(new InputStreamReader(is));
       outData = new StringBuffer();
       lineNum = 0;

       while ( (line = reader.readLine()) != null ) {

          // Skip first line
           if (lineNum==0) {
               lineNum++;
               continue;
           }

       outData.append(line);
       outData.append(newline);

      }

     is = new ByteArrayInputStream(outData.toString().getBytes("UTF-8"));

     dataContext.storeStream(is, props);
}
3 people found this helpful

Attachments

    Outcomes