In start shape i retrieved word format document after running the test when i click view document in connection data it showing some other language format.how to read the document in English.
Looks like this document was zipped before it hits this shape. Is this first shape in a sub process and the document was zipped before calling this sub process. Could you please paste your process for further analysis.
This is just a normal process and also this is not a sub process.
it just the beginning of the process by placing disk connector and retrieved word document which is not zipped.
To my knowledge using Document Viewer dialog, you can see only the text format data(xml, JSon, csv etc). For any other format, you can save the file to disk by using the "Download Original Document" option from the Document Viewer. Make sure you rename the .dat extension to .docx or .doc before opening the file.
Thanks for your comments
Have you tried checking the document after downloading it.
yes i checked the document after download it showing the same format data.
did some digging around, this is the default behavior of boomi. Its the same as opening a word docuemnt in a text editor. I found out that Apache POI has support to extract text from word doc and it seems pretty straightforward. Apache POI - Text Extraction Hope this is helpful !
can you elaborate in detail
Use the custom scripting via data process shape and use the code in the example. You will have to modify the code a little and put the jar files in the atom lib dir. Here is a good example of using groovy inside a boomi process How to Validate XML against an XML Schema using Groovy .
can u send me your document.
so that i can try for u
Ashok , below which i have mentioned the content written in the word document
Hi Friend! How are you welcome to my world?
In Atom setting try to change to make sure that the Japanese characters in the inbound file are read/processes correctly.
Add the following line to /bin/atom.vmoptions file if running the Atom as a service: Dfile.encoding=utf-8
Try it out !
Retrieving data ...