Wednesday, February 22, 2012

Message: An invalid XML character (Unicode: 0xb) was found in the CDATA section.

Once again the XML parsing failed due to a control character (^K)

A simple way to fix in unix was to use the following command and remove such characters from the file

perl -p -i -e 's/[\x00-\x08\x0b-\x0c\x0e-\x1f\x7f]//g' <xmlfile.xml>

Tuesday, January 03, 2012

Message: The entity "nbsp" was referenced, but not declared.

While parsing an XML using Stax I encountered this error.

Even though the data was inside CDATA tags it was still giving this error which was puzzling. On further analysis I found out that Control Characters (^M) was causing this problem.

To fix it in Unix/Linux, just remove control characters and I found this nifty command to do it:
tr -d "\r" < inputfile > outputfile