XMLReader

XMLReader is

an object mode Transform stream
consuming XMLLexer's output (distinct XML tags and text fragments),
transforming incoming strings into XMLNode objects,
adjusting data (see Data transformation below)
optionally transforming it in place with the provided map function,
and, finally:
- either pushing results to the Readable's output,
- or emitting them as SAX events (if there is at least one subscriber).

Usage

const {XMLReader} = require ('xml-toolkit')

const reader = new XMLReader ({...options}).process (readableStreamOrStringOrBuffer)

// scanning through the content:
for await (const e of reader) console.log (e)

// getting just one node:
const theNode = await reader.findFirst () // `null` unless found

Options

Name	Default	Description
useEntities	true	If `true`, the EntityResolver is in use, otherwise `&...;` may occur in output
useNamespaces	true	If `true`, all element attributes are scanned for `xmlns...` prefixes
filter	(xmlNode) => true	If set, this function is called for each XMLNode before `push`ing it out. Unless if returns a true value, the push is skipped. Think Array.filter
filterElements		Same as `filter`, but adds the condition `type='EndElement'`, so filtering only element nodes with `children` already parsed. Can be set as a string instead of function: in that case, acts like a filter on `localname`
stripSpace	`true` if `filterElements` is set, otherwise `false`	If `true`, text fragments are trimmed
map	(xmlNode) => any	If set, this function is called for each XMLNode transforming it in place before `push`ing it out. Think Array.map

Computed properties

Name	Type	Description
isSAX	Boolean	`true` iif there is at least one subscriber for some SAXEvent type. In that case, SAX events are emitted instead of Readable's `data` events.

Methods

`process (src [, lexerOptions])`

Create an XMLLexer with lexerOptions, pipe it to this XMLReader and parse the provided src.

Name, Params	Type	Description
src	Buffer, String or Readable	XML to parse
lexerOptions	Object	See XMLLexer#options

Return value: the XMLReader object (for chaining).

`async findFirst ()`

On the first XMLNode occured, returns it, destroying the stream.

Using this method makes sense mostly with filterElements or filter options set. Without that, will return StartElement for the document root, with attributes, but without any children.

Returns null if end of XML (EndDocument event) reached without any node found.

Throws an error if some SAX event type listener was set prior to calling findFirst.

Data transformation

Event substitution

on stream end, the 'EndDocument' event with empty line as src and xml is published;
for EndElement tags, a copy of StartElement XMLNode, with same attributes, but altered type field;
- for self enclosed elements, too, XMLNodes are published twice: as StartElement and as EndElement.

Text aggregation

Sequences of text/CDATA fragments are reported as atomic Characters events.

If useEntities option is set on (by default), Characters fragments are transformed by EntityResolver (CDATA never are).

To drop insignificant whitespace, use the stripSpace option. When it's set to true, every aggregated text fragment is trimmed down, emptied lines are ignored completely. So, for example, <foo/>\n\n<bar/> yields no Characters at all, but for a <![CDATA[cdata]]> section spaces are left in place.

Nodes subtree collection

XMLReader works in a manner similar to SAX parsers, emitting a flat flow of objects. They are XMLNodes, not just SAXEvents, but each element is published twice: once with 'StartElement' type, then with 'EndElement'. In the first case, the children list is naturally empty, in the latter it may be filled in.

For the sake of memory efficiency, with the filter / filterElements options in effect, children are collected only for nodes matching filter conditions. So, for example, parsing <root><leaf /></root> with filterElements='leaf' will yield the single leaf node pointing to the parent root with the empty children list. So it's safe to parse huge XML with millions of sibling leaves without a risk of memory overflow. But, sure, you can't have the complete DOM tree in that case.

Comparison to `XMLParser`

XMLParser and XMLReader and are both high level XML parsers producing XMLNodes. But:

Name	Proto	XML Source	Pro	Contra
`XMLReader`	`Transform`	`Readable`	allows to scan huge XML with limited memory footprint	asynchronous by nature
`XMLParser`	none	`String`	can be used in synchronous contexts, e. g. in object constructors	limited size XML only

So, XMLReader vs. XMLParser is basically like fs.createReadStream vs. fs.readFileSync.

XMLReader

Usage

Options

Computed properties

Methods

process (src [, lexerOptions])

async findFirst ()

Data transformation

Event substitution

Text aggregation

Nodes subtree collection

Comparison to XMLParser

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`process (src [, lexerOptions])`

`async findFirst ()`

Comparison to `XMLParser`