Parse HTML character references.
- What is this?
- When should I use this?
- Install
- Use
- API
Options- Compatibility
- Security
- Related
- Contribute
- License
This is a small and powerful decoder of HTML character references (often called entities).
You can use this for spec-compliant decoding of character references. Itβs small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.
This package is ESM only. In Node.js (version 16+), install with npm:
npm install parse-entitiesIn Deno with esm.sh:
import {parseEntities} from 'https://esm.sh/parse-entities@3'In browsers with esm.sh:
<script type="module">
import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>import {parseEntities} from 'parse-entities'
console.log(parseEntities('alpha & bravo')))
// => alpha & bravo
console.log(parseEntities('charlie ©cat; delta'))
// => charlie Β©cat; delta
console.log(parseEntities('echo © foxtrot ≠ golf 𝌆 hotel'))
// => echo Β© foxtrot β golf π hotelThis package exports the identifier
parseEntities.
It also exports the TypeScript types
Options,
ReferenceHandler,
TextHandler, and
WarningHandler.
There is no default export.
Configuration (TypeScript type).
Additional character to accept (string, default: '').
This allows other characters, without error, when following an ampersand.
Whether to parse value as an attribute value (boolean, default: false).
This results in slightly different behavior.
Whether to allow nonterminated references (boolean, default: true).
For example, ©cat for Β©cat.
This behavior is compliant to the spec but can lead to unexpected results.
Starting position of value (Point or Position, optional).
Useful when dealing with values nested in some sort of syntax tree.
The default is:
{line: 1, column: 1, offset: 0}Context used when calling reference (unknown, optional)
Reference handler (ReferenceHandler, optional).
Context used when calling text (unknown, optional).
Text handler (TextHandler, optional).
Context used when calling warning (unknown, optional).
Error handler (WarningHandler, optional).
Parse HTML character references.
value(string) β value to decodeoptions(Options, optional) β configuration
Decoded value (string).
Character reference handler.
this(*) β refers toreferenceContextwhen given toparseEntitiesvalue(string) β decoded character referenceposition(Position) β place wheresourcestarts and endssource(string) β raw source of character reference
Text handler.
this(*) β refers totextContextwhen given toparseEntitiesvalue(string) β string of contentposition(Position) β place wherevaluestarts and ends
Error handler.
this(*) β refers towarningContextwhen given toparseEntitiesreason(string) β human readable reason for emitting a parse errorpoint(Point) β place where the error occurredcode(number) β machine readable code the error
The following codes are used:
| Code | Example | Note |
|---|---|---|
1 |
foo & bar |
Missing semicolon (named) |
2 |
foo { bar |
Missing semicolon (numeric) |
3 |
Foo &bar baz |
Empty (named) |
4 |
Foo &# |
Empty (numeric) |
5 |
Foo &bar; baz |
Unknown (named) |
6 |
Foo € baz |
Disallowed reference |
7 |
Foo � baz |
Prohibited: outside permissible unicode range |
This project is compatible with maintained versions of Node.js.
When we cut a new major release,
we drop support for unmaintained versions of Node.
This means we try to keep the current release line,
parse-entities@4,
compatible with Node.js 16.
This package is safe: it matches the HTML spec to parse character references.
wooorm/stringify-entitiesβ encode HTML character referenceswooorm/character-entitiesβ info on character referenceswooorm/character-entities-html4β info on HTML4 character referenceswooorm/character-entities-legacyβ info on legacy character referenceswooorm/character-reference-invalidβ info on invalid numeric character references
Yes please! See How to Contribute to Open Source.
MIT Β© Titus Wormer