Insert a Byte Order Mark (BOM) in front of a .csv file

Document created by mike_c_frazier Employee on Jan 7, 2013Last modified by Adam Arrowsmith on May 13, 2016
Version 3Show Document
  • View in full screen mode

On some applications such as MS Excel versions previous to 2010, UTF-8 encoding may not be realized without a byte order mark (BOM) at the start of the file. This functionality differs when a file is opened with File > Open versus File Association within Windows.

 

Add the attached groovy script to a Data Process step prior to sending the data to a .csv file.

 

The script will insert the BOM at the start of the file. As a result, the file should be recognized by MS Excel if opened whether using File > Open or by File Association within Windows.

 

//This script adds a byte order mark for UTF-8 to the beginning of a set of a documents.

newline = System.getProperty("line.separator");
bom1 = "\u00EF";
bom2 = "\u00BB";
bom3 = "\u00BF";

for ( int i = 0; i < dataContext.getDataCount(); i++ ) {
   InputStream is = dataContext.getStream(i);
   Properties props = dataContext.getProperties(i);

   reader = new BufferedReader(new InputStreamReader(is));
   outData = new StringBuffer();
   lineNum = 0;

   while ( (line = reader.readLine()) != null ) {
      if (lineNum==0) {
         line = bom1 + bom2 + bom3 + line;
       }

      tData.append(line);
      outData.append(newline);
      lineNum++;

   }

   is = new ByteArrayInputStream(outData.toString().getBytes());
   dataContext.storeStream(is, props);

}
2 people found this helpful

Attachments

    Outcomes