- Unicode® character table, if you are interested in what the characters are
Sample Full Error
Unable to store data, error copying stream. Caused by: Failed generating xml document Caused by: Invalid white space character <<some_character>> in text to output
This error comes up every so often when XML is involved, such as writing to an XML profile. This error typically indicates a data error, specifically you are using an invalid character for XML. XML has a specified character range of:
U+0009, U+000A, U+000D, [U+0020-U+D7FF], [U+E000-U+FFFD], and [U+10000-U+10FFFF] excluding the surrogate blocks, FFFE, and FFFF
Often, you'll see errors for characters like 0xb (000B, vertical tab), 0x1a (001A, substitute character), 0x8 (0008, backspace), etc since they are before 0020. Usually you can see the character in Notepad++ if you have view symbols on or it might be a visible symbol, such as:
There's a couple of ways to go about this depending on how the characters are represented in the data. You can replace with whatever you want or with nothing. The general rule will be searching for \uFFFF where FFFF is the hexadecimal number (0-9, A-F) of the character that is throwing the error. Ignore the front "0x" part and use 0 (zeroes) to fill in where there are are no digits. 0x8 becomes \u0008, 0x1b becomes \u001b.
Note: (Thanks to James Ahlborn for providing this explanation) this type of error can sometimes result when a document is parsed with the wrong encoding (causing valid characters in the correct encoding to become invalid characters in the incorrect encoding). This can happen when you have a UTF-8 encoded document with multi-byte characters which is interpreted using some single byte encoding (like ascii or windows-1252). See this thread for some good recommendations around handling character encoding correctly Please Explain Character Encoding.
You get the error:
Unable to store data, error copying stream. Caused by: Failed generating xml document Caused by: Invalid white space character 0x17 in text to output
where 0x17 is being flagged. The search you'll be doing is:
- Remove the front "0x" and now you have 17
- Add zeroes to "17" until there are 4 characters, 0017
- Add the \u, and we end up with \u0017 for the search
Note: for unicode with alphabet characters, lowercase and uppercase are the same so \u001B will work the same as \u001b
When it is an actual symbol in your data, you can simply search for the Unicode equivalent in the data process, e.g. \u0008 for 0x8, \u000B for 0xb.
When it is more literal and written in your data, e.g. "test data... \u000B test data...", then you need to search for the literal code e.g. \\u000B.
When you have a lot of data
If you can easily see your data, you can test it and will be able to find the character in question. When you have a lot of data (or cannot open the file because it is too large) and you want to see where the character resides, you'll need to do a little extra work. Test mode might not be able to complete the process (because of test mode limits) and show you where the error is so you will need to deploy the process so it can go through all of the data.
The general idea is to split your data up into smaller chunks. If you are transforming a JSON to XML and you're getting this error, then you could possibly split up the JSON via the split documents in data process and send that to a try/catch to find out which piece of data it is.
Here I put the data locally on my computer and picked it up and split it by an array element (you should split where it makes sense) and I send that down the process. The try/catch will send data to the map. If the source data does not have the invalid characters, then it will complete. If it does have invalid data, then the process will send it down the catch path. I chose to combine all of the data causing issues into one and writing it to my disk so I can retrieve it.