Skip to content

XMLParser

do- edited this page Sep 28, 2024 · 17 revisions

XMLParser is a... synchronous XML parser.

It takes the source XML text in the form of a String (that must be loaded entirely) and gives out an XMLNode instance representing its root (aka document) node with all hierarchy filled in.

Elements and text/CDATA nodes are present in the output, all other (comments, prolog/processing instructions and exotic stuff like DTD) are ignored.

In application code, it may be convenient not to use raw XMLNode, but somehow normalize it with:

  • the detach instance method (as shown in the example below) or
  • the XMLNode.toObject standalone transformer.

Usage

const fs = require ('fs')
const {XMLParser} = require ('xml-toolkit')

const xml    = fs.readFileSync ('doc.xml')
const parser = new XMLParser  ({...options})

const document = parser.process (xml)

for (const element of document.detach ().children) {
  console.log (element.attributes)
}

Options

Name Default Description
useEntities true If true, the EntityResolver is in use, otherwise &...; may occur in output
useNamespaces true If true, all element attributes are scanned for xmlns... prefixes
stripSpace true If true, text fragments are trimmed

Methods

process (src)

...

Name, Params Type Description
src String XML to parse

Return value: the XMLNode object representing the document element.

Data transformation

Sequences of text/CDATA fragments are concatenated together to atomic Characters nodes.

If useEntities option is set on (by default), Characters fragments are transformed by EntityResolver (CDATA never are).

To drop insignificant whitespace, use the stripSpace option. When it's set to true, every aggregated text fragment is trimmed down, emptied lines are ignored completely. So, for example, <foo/>\n\n<bar/> yields no Characters at all, but for a <![CDATA[cdata]]> section spaces are left in place.

Comparison to XMLIterator

XMLParser and XMLIterator and are both synchronous XML parsers reading a complete String. But:

Name Proto Pro Contra
XMLIterator Iterable scans XML tag by tag, allows early completion doesn't build hierarchy
XMLParser none simple always allocates the complete document tree in memory

In fact, XMLParser is built on top of XMLIterator.

Comparison to XMLReader

XMLParser and XMLReader and are both high level XML parsers producing XMLNodes. But:

Name Proto XML Source Pro Contra
XMLReader Transform Readable allows to scan huge XML with limited memory footprint asynchronous by nature
XMLParser none String can be used in synchronous contexts, e. g. in object constructors limited size XML only

So, XMLReader vs. XMLParser is basically like fs.createReadStream vs. fs.readFileSync.

Clone this wiki locally