Failed Generating XML Document: Invalid white space character

Document created by thanh_n88 Employee on Jul 31, 2018Last modified by thanh_n88 Employee on Aug 1, 2018
Version 4Show Document
  • View in full screen mode

Reference Article

 

Sample Full Error

Unable to store data, error copying stream. Caused by: Failed generating xml document Caused by: Invalid white space character <<some_character>> in text to output

 

Cause

This error comes up every so often when XML is involved, such as writing to an XML profile. This error typically indicates a data error, specifically you are using an invalid character for XML. XML has a specified character range of:

U+0009, U+000A, U+000D, [U+0020-U+D7FF], [U+E000-U+FFFD], and [U+10000-U+10FFFF] excluding the surrogate blocks, FFFE, and FFFF

Often, you'll see errors for characters like 0xb (000B, vertical tab), 0x1a (001A, substitute character), 0x8 (0008, backspace), etc since they are before 0020. Usually you can see the character in Notepad++ if you have view symbols on or it might be a visible symbol, such as:

 

 

 

Solution

There's a couple of ways to go about this depending on how the characters are represented in the data. You can replace with whatever you want or with nothing. The general rule will be searching for \uFFFF where FFFF is the hexadecimal number (0-9, A-F) of the character that is throwing the error. Ignore the front "0x" part and use 0 (zeroes) to fill in where there are are no digits. 0x8 becomes \u0008, 0x1b becomes \u001b. 

 

Note: (Thanks to James Ahlborn for providing this explanation) this type of error can sometimes result when a document is parsed with the wrong encoding (causing valid characters in the correct encoding to become invalid characters in the incorrect encoding). This can happen when you have a UTF-8 encoded document with multi-byte characters which is interpreted using some single byte encoding (like ascii or windows-1252). See this thread for some good recommendations around handling character encoding correctly Please Explain Character Encoding. 

 

Example

You get the error: 

Unable to store data, error copying stream. Caused by: Failed generating xml document Caused by: Invalid white space character 0x17 in text to output

 

where 0x17 is being flagged. The search you'll be doing is:

  1. Remove the front "0x" and now you have 17
  2. Add zeroes to "17" until there are 4 characters, 0017
  3. Add the \u, and we end up with \u0017 for the search
    Note: for unicode with alphabet characters, lowercase and uppercase are the same so \u001B will work the same as \u001b

 

Symbol

When it is an actual symbol in your data, you can simply search for the Unicode equivalent in the data process, e.g. \u0008 for 0x8, \u000B for 0xb.

 

 

Literal Code

When it is more literal and written in your data, e.g. "test data... \u000B test data...", then you need to search for the literal code e.g. \\u000B.

 

When you have a lot of data

If you can easily see your data, you can test it and will be able to find the character in question. When you have a lot of data (or cannot open the file because it is too large) and you want to see where the character resides, you'll need to do a little extra work. Test mode might not be able to complete the process (because of test mode limits) and show you where the error is so you will need to deploy the process so it can go through all of the data. 

 

The general idea is to split your data up into smaller chunks. If you are transforming a JSON to XML and you're getting this error, then you could possibly split up the JSON via the split documents in data process and send that to a try/catch to find out which piece of data it is. 

 

Here I put the data locally on my computer and picked it up and split it by an array element (you should split where it makes sense) and I send that down the process. The try/catch will send data to the map. If the source data does not have the invalid characters, then it will complete. If it does have invalid data, then the process will send it down the catch path. I chose to combine all of the data causing issues into one and writing it to my disk so I can retrieve it.

Attachments

    Outcomes