XML Parsing In Shell Script

Home » CentOS » XML Parsing In Shell Script

March 18, 2021 Hakan CentOS 9 Comments

I have a challenge I am interested in getting feedback on.

I will on a regular basis download a series of data files from the web where the data is in XML-format. The format is known in advance but is different between the various data files. I then plan to extract the various data items (“elements?”) from each data file, do some light formatting and then save desired parts of each original data file as a formatted CSV-file for later importing into a database.

As the plan is to use a bash shell script using curl to get the files, I have begun looking at external XML parsers that I can call from my script, perhaps specify which elements I want, get the data back in some kind of bash data structure and finally format and save as CSV-files.

There seems to be a number of XML parsers available but perhaps someone on the list has a recommendation for which one might suit my needs best? I should add that I am running CentOS 7.

Thank you.

9 thoughts on - XML Parsing In Shell Script

Paul Heinlein says:

March 18, 2021 at 3:32 pm

Will you be using an XSLT stylesheet to do the work? There’s a somewhat steep learning curve, but in my experience it’s the most reliable method for parsing XML except in the very simplest of cases.

In that case, the libxslt stuff may be what you want:

http://xmlsoft.org/libxslt/

The command-line tool is xsltproc.

Again, it’s not easy to use, but once you’ve built a toolchain, it will be reliable and fairly easy to modify if the source XML schema change.
Hakan says:

March 18, 2021 at 4:09 pm

I just checked and I cannot see that the organization publishing these data files offer any XSLT stylesheet. IOW, I am, perhaps incorrectly, assuming that the publisher of the data would be one with said stylesheet. (Although perhaps that is something an end-user could put together as well??)

Although the data format of each data series is unique, it is simple and could conceivably be parsed using grep but I am looking for a more “forward-looking” solution for other applications in the future.

If XSLT stylesheets are not available – would you suggest another tool? Or, would you suggest I design sheets, presumably one for for each data series?
Paul Heinlein says:

March 18, 2021 at 4:54 pm

Some high-profile XML schemata (e.g., DocBook) have published stylesheets, but mostly I’ve written my own. I have a very trivial example in a blog post from several years ago:

https://www.madboa.com/blog/2014/09/10/strip-rss/

(My site is completely non-commercial. I gain nothing by you visiting it — or ignoring it.)
Hakan says:

March 18, 2021 at 7:19 pm

I looked at your link above and the the one in your previous e-mail – looks very promising!

I will take a look at creating a XSLT stylesheet over the weekend and try creating a CSV-file in the desired format.

Thank you!
Fabian Arrotin says:

March 19, 2021 at 11:40 am

I used in the past xmlstarlet (available in epel) for quick parsing from within bash scripts. For something more robust, maybe switch to python ? (ymmv)
Johnny Hughes says:

March 19, 2021 at 2:25 pm

Am 19.03.21 um 17:40 schrieb Fabian Arrotin:

just for a value grep use xmllint (its in libxml2 package):

Example:

XML input:

OK

bash var:

STATUS=$(echo ${RESPONSE} | xmllint –format –xpath
“//methodResponse/params/param/value/string/text()” – 2>/dev/null)
Hakan says:

March 19, 2021 at 6:50 pm

I created a XSLT stylesheet for the data file I tried this on and it worked beautifully. I think the extra time spent designing a stylesheet is time well spent for any future changes to the data format.

Thank you!
Hakan says:

March 19, 2021 at 7:25 pm

Thank you, I decided to put together an XSLT stylesheet for each data file format, I think this might be the best for the future.
Hakan says:

March 19, 2021 at 7:26 pm

I wanted to do this in bash and decided on calling xsltproc while investing in writing an XSLT stylesheet for each data file format.

XML Parsing In Shell Script

9 thoughts on - XML Parsing In Shell Script

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta