Skip to content

Parse HTML character references

License

Notifications You must be signed in to change notification settings

wooorm/parse-entities

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

117 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

parse-entities

Build Coverage Downloads Size

Parse HTML character references.

Contents

What is this?

This is a small and powerful decoder of HTML character references (often called entities).

When should I use this?

You can use this for spec-compliant decoding of character references. It’s small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install parse-entities

In Deno with esm.sh:

import {parseEntities} from 'https://esm.sh/parse-entities@3'

In browsers with esm.sh:

<script type="module">
  import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>

Use

import {parseEntities} from 'parse-entities'

console.log(parseEntities('alpha &amp bravo')))
// => alpha & bravo

console.log(parseEntities('charlie &copycat; delta'))
// => charlie Β©cat; delta

console.log(parseEntities('echo &copy; foxtrot &#8800; golf &#x1D306; hotel'))
// => echo Β© foxtrot β‰  golf πŒ† hotel

API

This package exports the identifier parseEntities. It also exports the TypeScript types Options, ReferenceHandler, TextHandler, and WarningHandler. There is no default export.

Options

Configuration (TypeScript type).

Fields
options.additional

Additional character to accept (string, default: ''). This allows other characters, without error, when following an ampersand.

options.attribute

Whether to parse value as an attribute value (boolean, default: false). This results in slightly different behavior.

options.nonTerminated

Whether to allow nonterminated references (boolean, default: true). For example, &copycat for Β©cat. This behavior is compliant to the spec but can lead to unexpected results.

options.position

Starting position of value (Point or Position, optional). Useful when dealing with values nested in some sort of syntax tree. The default is:

{line: 1, column: 1, offset: 0}
options.referenceContext

Context used when calling reference (unknown, optional)

options.reference

Reference handler (ReferenceHandler, optional).

options.textContext

Context used when calling text (unknown, optional).

options.text

Text handler (TextHandler, optional).

options.warningContext

Context used when calling warning (unknown, optional).

options.warning

Error handler (WarningHandler, optional).

parseEntities(value[, options])

Parse HTML character references.

Parameters
  • value (string) β€” value to decode
  • options (Options, optional) β€” configuration
Returns

Decoded value (string).

ReferenceHandler

Character reference handler.

Parameters
  • this (*) β€” refers to referenceContext when given to parseEntities
  • value (string) β€” decoded character reference
  • position (Position) β€” place where source starts and ends
  • source (string) β€” raw source of character reference

TextHandler

Text handler.

Parameters
  • this (*) β€” refers to textContext when given to parseEntities
  • value (string) β€” string of content
  • position (Position) β€” place where value starts and ends

WarningHandler

Error handler.

Parameters
  • this (*) β€” refers to warningContext when given to parseEntities
  • reason (string) β€” human readable reason for emitting a parse error
  • point (Point) β€” place where the error occurred
  • code (number) β€” machine readable code the error

The following codes are used:

Code Example Note
1 foo &amp bar Missing semicolon (named)
2 foo &#123 bar Missing semicolon (numeric)
3 Foo &bar baz Empty (named)
4 Foo &# Empty (numeric)
5 Foo &bar; baz Unknown (named)
6 Foo &#128; baz Disallowed reference
7 Foo &#xD800; baz Prohibited: outside permissible unicode range

Compatibility

This project is compatible with maintained versions of Node.js.

When we cut a new major release, we drop support for unmaintained versions of Node. This means we try to keep the current release line, parse-entities@4, compatible with Node.js 16.

Security

This package is safe: it matches the HTML spec to parse character references.

Related

Contribute

Yes please! See How to Contribute to Open Source.

License

MIT Β© Titus Wormer

About

Parse HTML character references

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Contributors 7