parse-entities

Parse HTML character references.

What is this?

This is a small and powerful decoder of HTML character references (often called entities).

When should I use this?

You can use this for spec-compliant decoding of character references. It’s small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install parse-entities

In Deno with esm.sh:

import {parseEntities} from 'https://esm.sh/parse-entities@3'

In browsers with esm.sh:

<script type="module">
  import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>

Use

import {parseEntities} from 'parse-entities'

console.log(parseEntities('alpha &amp bravo')))
// => alpha & bravo

console.log(parseEntities('charlie &copycat; delta'))
// => charlie ©cat; delta

console.log(parseEntities('echo &copy; foxtrot &#8800; golf &#x1D306; hotel'))
// => echo © foxtrot ≠ golf 𝌆 hotel

API

This package exports the identifier parseEntities. It also exports the TypeScript types Options, ReferenceHandler, TextHandler, and WarningHandler. There is no default export.

`Options`

Configuration (TypeScript type).

Fields

`options.additional`

Additional character to accept (string, default: ''). This allows other characters, without error, when following an ampersand.

`options.attribute`

Whether to parse value as an attribute value (boolean, default: false). This results in slightly different behavior.

`options.nonTerminated`

Whether to allow nonterminated references (boolean, default: true). For example, &copycat for ©cat. This behavior is compliant to the spec but can lead to unexpected results.

`options.position`

Starting position of value (Point or Position, optional). Useful when dealing with values nested in some sort of syntax tree. The default is:

{line: 1, column: 1, offset: 0}

`options.referenceContext`

Context used when calling reference (unknown, optional)

`options.reference`

Reference handler (ReferenceHandler, optional).

`options.textContext`

Context used when calling text (unknown, optional).

`options.text`

Text handler (TextHandler, optional).

`options.warningContext`

Context used when calling warning (unknown, optional).

`options.warning`

Error handler (WarningHandler, optional).

`parseEntities(value[, options])`

Parse HTML character references.

Parameters

value (string) — value to decode
options (Options, optional) — configuration

Returns

Decoded value (string).

`ReferenceHandler`

Character reference handler.

Parameters

this (*) — refers to referenceContext when given to parseEntities
value (string) — decoded character reference
position (Position) — place where source starts and ends
source (string) — raw source of character reference

`TextHandler`

Text handler.

Parameters

this (*) — refers to textContext when given to parseEntities
value (string) — string of content
position (Position) — place where value starts and ends

`WarningHandler`

Error handler.

Parameters

this (*) — refers to warningContext when given to parseEntities
reason (string) — human readable reason for emitting a parse error
point (Point) — place where the error occurred
code (number) — machine readable code the error

The following codes are used:

Code	Example	Note
`1`	`foo &amp bar`	Missing semicolon (named)
`2`	`foo &#123 bar`	Missing semicolon (numeric)
`3`	`Foo &bar baz`	Empty (named)
`4`	`Foo &#`	Empty (numeric)
`5`	`Foo &bar; baz`	Unknown (named)
`6`	`Foo baz`	Disallowed reference
`7`	`Foo &#xD800; baz`	Prohibited: outside permissible unicode range

Compatibility

This project is compatible with maintained versions of Node.js.

When we cut a new major release, we drop support for unmaintained versions of Node. This means we try to keep the current release line, parse-entities@4, compatible with Node.js 16.

Security

This package is safe: it matches the HTML spec to parse character references.

wooorm/stringify-entities — encode HTML character references
wooorm/character-entities — info on character references
wooorm/character-entities-html4 — info on HTML4 character references
wooorm/character-entities-legacy — info on legacy character references
wooorm/character-reference-invalid — info on invalid numeric character references

Contribute

Yes please! See How to Contribute to Open Source.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github/workflows		.github/workflows
lib		lib
.editorconfig		.editorconfig
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
funding.yml		funding.yml
index.d.ts		index.d.ts
index.js		index.js
license		license
package.json		package.json
readme.md		readme.md
test.js		test.js
tsconfig.json		tsconfig.json

Uh oh!

License

wooorm/parse-entities

Folders and files

Latest commit

History

Repository files navigation

parse-entities

Contents

What is this?

When should I use this?

Install

Use

API

Options

Fields

options.additional

options.attribute

options.nonTerminated

options.position

options.referenceContext

options.reference

options.textContext

options.text

options.warningContext

options.warning

parseEntities(value[, options])

Parameters

Returns

ReferenceHandler

Parameters

TextHandler

Parameters

WarningHandler

Parameters

Compatibility

Security

Related

Contribute

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Sponsor this project

Uh oh!

Used by 1.7m

Contributors 7

Uh oh!

Languages

`Options`

`options.additional`

`options.attribute`

`options.nonTerminated`

`options.position`

`options.referenceContext`

`options.reference`

`options.textContext`

`options.text`

`options.warningContext`

`options.warning`

`parseEntities(value[, options])`

`ReferenceHandler`

`TextHandler`

`WarningHandler`