Working with Excel Files

Document created by Adam Arrowsmith Employee on Dec 3, 2015Last modified by Adam Arrowsmith Employee on May 7, 2017
Version 8Show Document
  • View in full screen mode

This article describes how to work with Microsoft Excel files using custom scripting to convert to/from delimited flat file format.

 

 

Overview

Although AtomSphere does not natively support Excel as a profile type you can still work with Excel files by using a simple Groovy script to convert them to/from delimited flat file data structures.

 

Note: As of the April 2017 release which introduced Custom Library components, you can use this script in the Dell Boomi Atom Cloud in addition to local Atoms.

 

Prerequisites

The scripts require additional third-party Java libraries to be manually imported into the Atom.

  1. Download the Apache POI binary distribution for your operating system from here http://poi.apache.org/download.html. Look for the Resource Archives > Binary Artifacts.
    • These scripts have been verified against version 3.13. The scripts may be compatible with other versions but not guaranteed.
    • Obtain the *.zip file for Windows or *.tar.gz file for Linux. Examples:
      • poi-bin-3.13-20150929.zip
      • poi-bin-3.13-20150929.tar.gz
  2. Extract the following jars from the archive and copy them to the Atom directory: ../<Atom root>/userlib/script:
    • poi-3.13-20150929.jar
    • poi-ooxml-3.13-20150929.jar
    • poi-ooxml-schemas-3.13-20150929.jar
    • ooxml-lib/xmlbeans-2.6.0.jar
  3. Restart the Atom.

 

Reading an Excel File

Reading an Excel file works by converting the native Excel workbook to an AtomSphere flat file document. The source Excel file may be XLS or XLSX format.

 

Assumptions

  • Workbook content should be simple tabular data. Advanced features like formulas, pivot tables, charts, etc. are not supported.
  • Data should begin in cell A1 and run contiguously.
  • Cell values should not contain embedded line breaks.
  • The Excel file may contain one or more sheets.
  • There is no maximum number of rows or columns however the script has not been verified for very large workbooks.

 

Usage

  1. Read the Excel file into the process using the appropriate connector. This will typically be a file-based connector such as disk or FTP.
  2. Add a Data Process shape with a Custom Scripting step. Replace the entire script with the one below.
  3. If the Excel file contains multiple worksheets, a separate document will be created for each. The worksheet name is captured in a Dynamic Document Property named "SHEET". You can use this property to identify and route each document accordingly, to different mapping or process shapes if desired.
  4. If you need to map or work with the Excel data, you will need to manually create or import a flat file profile that matches the columns of each sheet. The flat file profile should be configured with the following Options:
    • Use Column Headers = true or false, depending if the first row in the Excel sheet contains column headers
    • File Type = Delimited
    • File Delimiter = Comma Delimited
    • Text Qualifier = Double Quote

 

Script

import java.util.Locale;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.ss.usermodel.DateUtil;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;

for (int i = 0; i < dataContext.getDataCount(); i++) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);

DataFormatter formatter = new DataFormatter(Locale.default);
Workbook wb = WorkbookFactory.create(is);
List sheetList = wb.sheets;
StringBuilder sb = new StringBuilder();

for (int j = 0; j < sheetList.size(); j++) {
  Sheet sheet = wb.getSheetAt(j);
  for (Row row: sheet) {
   for (Cell cell: row) {
    switch (cell.getCellType()) {
     case Cell.CELL_TYPE_NUMERIC:
      if (DateUtil.isCellDateFormatted(cell)) {
       sb.append("\"" + formatter.formatCellValue(cell) + "\"");
      } else {
       sb.append("\"" + cell.getNumericCellValue() + "\"");
      }
      break;
     case Cell.CELL_TYPE_STRING:
      sb.append("\"" + cell.getStringCellValue() + "\"");
      break;
     case Cell.CELL_TYPE_FORMULA:
      sb.append("\"" + cell.getCellFormula() + "\"");
      break;
     case Cell.CELL_TYPE_BOOLEAN:
      sb.append("\"" + cell.getBooleanCellValue() + "\"");
      break;
     case Cell.CELL_TYPE_BLANK:
      sb.append("");
      break;
     default:
      sb.append("");
      break;
    }
    sb.append(",");
   }
   if (sb.length() > 0) {
    sb.setLength(sb.length() - 1);
   }
   sb.append("\r\n");
  }

  String output = sb.toString();
  sb.setLength(0);
  is = new ByteArrayInputStream(output.getBytes());
  props.setProperty("document.dynamic.userdefined.SHEET", sheet.getSheetName());
  dataContext.storeStream(is, props);}
}

 

Writing an Excel File

Writing an Excel file works by converting an AtomSphere flat file document into a native Excel workbook. The result Excel file will be XSLX format.

 

Assumptions

  • The flat file profile must be configured with the following Options:
    • File Type = Delimited
    • File Delimiter = Comma Delimited
    • Text Qualifier = Double Quote (optional, but double quotes will be written as part of the resulting cell value)
  • The flat file profile should only contain a single Record Type. Embedded line breaks and delimiters are not supported, even if properly escaped according to CSV syntax.
  • The resulting workbook will only contain a single sheet.
  • A separate workbook will be created for each document sent to the script.
  • There is no maximum number of rows or columns however the script has not been verified for very large workbooks.

 

Usage

  1. Map source data into the flat file profile to be converted by the script.
  2. Add a Data Process shape with a Custom Scripting step. Replace the entire script with the one below.
  3. If outputting the data to a file, set the appropriate file name (e.g. disk, FTP). Remember to configure the file name with the file extension ".xlsx" because the script will output the data in XLSX format.
  4. Write the Excel file out using the appropriate connector.

 

Script

import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

for (int i = 0; i < dataContext.getDataCount(); i++) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
Workbook wb = new XSSFWorkbook();
Sheet sheet = wb.createSheet("Sheet1");
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
int rownum = 0;

while ((line = reader.readLine()) != null) {
  Row row = sheet.createRow(rownum);
  rownum++;
  int cellnum = 0;
  String[] fields = line.split(",");
  for (int j = 0; j < fields.length; j++) {
   row.createCell(cellnum).setCellValue(fields[j]);
   cellnum++;
  }

}

ByteArrayOutputStream baos = new ByteArrayOutputStream();
wb.write(baos);
is = new ByteArrayInputStream(baos.toByteArray());
dataContext.storeStream(is, props);
}
14 people found this helpful

Attachments

    Outcomes