How to design a process to handle non UTF-8 characters

Document created by mike_c_frazier Employee on Dec 6, 2012Last modified by Adam Arrowsmith on Mar 13, 2017
Version 2Show Document
  • View in full screen mode

You need to first use the Data Process > Character Decode shape in your process to first decode the character set to UTF-8. This means you will need to know the encoding of the source data to get consistent results. You can determine the encoding of the source data by viewing it in a editor like Notepad++ after you enable the "View->Hexadecimal and Ascii" option in Notepad++. This will show you the hex value of a character along with the ASCII value. You can use that information to determine (use google search) the encoding (most likely ISO-8859-1 or WINDOWS-1252) and specify the canonical name (see Java 8 Supported Encodings) in the Data Process > Character Decode shape in your process.

 

If you have done this correctly and you are running the process in the Dell Boomi Atom Cloud, you will not need to make any further changes. However, if you are running a local atom you may need to add the following line to the ../<ATOM_INSTALL_ROOT>/bin/atom.vmoptions file (or if the Atom is running in Desktop mode, edit the atomw.vmoptions file instead):

 

-Dfile.encoding=utf-8

 

Save the file and restart the Atom before re-executing the process.

7 people found this helpful

Attachments

    Outcomes