Skip to content

nmichel/ejpet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

113 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ejpet

Matching JSON nodes in Erlang.

hex.pm version Build Status

What for ?

Kind of regular expression applied to JSON documents.

  • Find if a JSON document has some structural properties, and possibly extract some information.
  • Useful to extract small data pieces from large JSON documents.
  • Efficient filtering of JSON nodes in real time.

Backends for jsone, jsx, jiffy and mochijson2.

Quick start

Obtain ejpet

Add it to your project

Add a dependency to ejpet and possibly to a supported JSON codec in your project dependency set.

  • With rebar3, in rebar.config file
{deps, [
    %% ...
    {ejpet, ".*", {git, "git://github.com/nmichel/ejpet.git", {tag, "0.7.0"}},
    {jsx, ".*", {git, "https://github.com/talentdeficit/jsx.git", {tag, "v2.8.3"}},
    %% ...
]}.
  • With mix, in mix.exs file
defmodule MyProject.Mixfile do
  use Mix.Project
  
  def project do
    [
      # ...
      deps: deps()
      # ...
    ]
  end
  
  defp deps() do
    [
      # ...
      {:ejpet, "~> 0.7.0"},
      {:jsx, "~> 2.8"},
      # ...
    ]
  end
end

From source

Clone

$ git clone git@github.com:nmichel/ejpet.git

Build

$ cd ejpet
$ ./rebar get-deps
$ make && make test

Start Erlang shell

erl -pz ./ebin ./deps/*/ebin

Start (m)using

Read some JSON data

1> {ok, Data} = file:read_file("./test/channels_list.json").
{ok,<<239,187,191,91,13,10,32,32,32,32,123,13,10,32,32,
      32,32,32,32,32,32,34,110,117,109,98,101,...>>}

Decode JSON using, say, jsx (provided you have jsx in your load path)

2> Node = jsx:decode(Data).
[[{<<"number">>,1},
  {<<"lcn">>,2},
  {<<"name">>,<<"France 2">>},
  {<<"sap_group">>,<<>>},
  {<<"ip_multicast">>,<<"239.100.10.1">>},
  {<<"port_multicast">>,1234},
  {<<"num_clients">>,0},
  {<<"scrambling_ratio">>,0},
  {<<"is_up">>,1},
  {<<"pcr_pid">>,120},
  {<<"pmt_version">>,4},
  {<<"unicast_port">>,0},
  {<<"service_id">>,257},
  {<<"service_type">>,
   <<"Please report : Unknown service type doc : EN 30"...>>},
  {<<"pids_num">>,7},
  {<<"pids">>,
...

Ok. Now define what we are looking for, and what we want to get

Find somewhere in a list, an object with
* a {"ip_multicast", "239.100.10.4"} pair
* a key "pcr_pid", whatever value captured in variable "pcr",
* a key "pids", which value is either a list or an object into which there are
  * an object with
    * a key "language" which value matches regex "^fr",
    * a key "number", whatever value captured in variable "apid"
    * a key "type", whatever value captured in variable "acodec"
  * an object with
    * a key "type", which value matches regex "Video" captured in variable "vcodec"
    * a key "number", whatever value captured in variable "vpid"
3>  O = ejpet:compile("[*, {\"ip_multicast\":\"239.100.10.4\",
                            \"pcr_pid\":(?<pcr>_),
                            \"pids\":<{\"language\": #\"^fr\",
                                       \"number\": (?<apid>_),
                                       \"type\": (?<acodec>_)},
                                      {\"type\": (?<vcodec>#\"Video\"),
                                       \"number\": (?<vpid>_)}>}, *]", jsx).
{ejpet,jsx,#Fun<ejpet_jsx_generators.9.11467207>}

Run and seek ...

4>  ejpet:run(Node, O).

Here you are !

{true,[{"vpid",520},
       {"vcodec",[<<"Video (MPEG2)">>]},
       {"acodec",[<<"Audio (MPEG1)">>]},
       {"apid",530},
       {"pcr",520}]}

How ?

Express what you want to match using a simple expression language.

Expression syntax

pattern match ? Notes
true true
false false
null null
"string" the string "string" UTF-8 encoded string (with escaping)
#"regex" any string matching regex "regex" UTF-8 encoded string (no escaping)
number the number number e.g. (42, 3.14159, -3395.1264e-22 )
{ kv* } object for which all kv (key/value) patterns are matched Order does not matter
[ item* (, *)?] list for which all item patterns are matched Order DOES matter
< value* > value set (list, or object values) for which all value patterns are matched Order does not matter
< value* >/g same as previous but search for ALL matches. Useful only when capturing Order does not matter
<! value* !> same as < value* > but search deep.
<! value* !>/g same as previous but search for ALL matches. Useful only when capturing
(?<name>expr) capture expression expr in return value name Every JSON expression may be captured
(!<name>type) match json object of type type against parameter named name

kv may be one of the form

  • _:pattern
  • "key":_
  • "key":pattern

item may be one of the form

  • *, pattern
  • pattern

value is a pattern

kv, item and value are separated by ,.

In parameter injection typemay be

  • number
  • boolean
  • string
  • regex

Notes

Numbers

number matching may be strict or loose, depending on an option passed are compile-time.

1> ejpet:match(<<"42.0">>, "42").
{true,<<"{}">>}
2> ejpet:match(<<"42.0">>, "42", [{number_strict_match, true}]).
{false,<<"{}">>}

Strings and Regex

string and regex are UTF-8 encoded byte streams.

They may contain escaping sequences, as in "\\b", or "\u00E9". When found in a string these sequences are interpreted by default (but they may be left as-is with option string_apply_escape_sequence set to false). Found in regex they are not interpreted.

3> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, true}]).
{true,<<"{}">>}
4> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{false,<<"{}">>}
5> ejpet:match(<<"\"\\\\u00E9\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{true,<<"{}">>}

Codepoint produced by evaluating an escape sequence of the form \uABCD is NOT checked. One can insert any codepoint, valid or not, in a string or regex.

Captures

Every pattern p can be captured by simply substituing it by (?<variable_name>p). Captures are returned as a JSON object, where each variable_name ìs a key, and the list if captures found for that variable is the value.

This JSON object is build with repect to the backend indicated when compiling the pattern.

Warning : if there is no captures to return, the empty JSON object {} will be returned. But its actual form depends on the backend.

  • jsx: [{}]
  • jiffy: {[]}
  • mochijson: {struct, []}
  • jsone: #{}

One may wonder why return captures as a encoded JSON object. There is 2 reasons :

  1. captures objects are captured "as is" in the parsed document, i.e. in their encoded form. Using the backend encoding for the result is more coherent;
  2. capture JSON object can itself be pattern matched.

Parameters Injection

It is possible to provide some matching values at match-time, through parameter injection forms like (!<param_name>param_type), where param_type may be number, string, boolean and regex. At match-time, produced matching functions will look for an entry named param_name in the provided parameters list. See ejpet:run/3 and ejpet:match/4.

Note that string values should be binaries, and regex values MUST be mp() opaque objects returned by re:compile/2.

API

backend() = jsx | jiffy | mochijson2 | jsone
epm() = {ejpet, term(), term()}
expr_src() = string()
compile_option() = {string_apply_escape_sequence, boolean()}
                 | {number_strict_match, boolean()}

json_input() = string() | binary()
json_src() = binary()
json_term() = jsx_term() | jiffy_term() | mochijson2_term()

run_param_name = binary()
run_param_value = boolean() | number() | binary() | re::mp()
run_param = {run_param_name(), run_param_value()}                                                                                                                                                                  

run_res() = {match_stat(), json_term()}
match_res() = {match_stat(), json_src()}
match_stat() = true | false

ejpet:decode(JSONText, Backend) -> json_term()

  JSONText = json_input()
  Backend = backend()

ejpet:encode(JSONTerm, Backend) -> json_term()

  JSONTerm = json_term()
  Backend = backend()

ejpet:compile(Expr, Backend, Options) -> epm()

  Expr = expr_src()
  Backend = backend()
  Options = [Option]
  Option = compile_option()

ejpet:compile(Expr, Backend) -> epm()

  Same as ejpet:compile(Expr, Backend, [])
  
ejpet:compile(Expr) -> epm()

  Same as ejpet:compile(Expr, jsx, [])

ejpet:backend(EPM) -> backend()

  EPM = epm()

ejpet:run(JSONTerm, EPM, Params) -> run_res()

  EPM = epm()
  JSONTerm = json_term()
  Params = [Param]
  Param = run_param()

ejpet:run(JSONTerm, EPM) -> run_res()

  Same pas ejpet:run(JSONTerm, EPM, [])

ejpet:match(JSONText, Expr, Options, Params) -> match_res()

  JSONText = json_input()
  Expr = expr_src() | epm()
  Options = [Option]
  Option = compile_option()
  Params = [Param]
  Param = run_param()

ejpet:match(JSONText, Expr, Options) -> match_res()

  Same as ejpet:match(JSONText, Expr, Options, [])
  
ejpet:match(JSONText, Expr) -> match_res()

  Same as ejpet:match(JSONText, Expr, [], [])
  
ejpet:get_status(Res) -> match_stat()

  Res = run_res() | match_res()

get_captures(Res) -> json_term()

  Res = run_res() | match_res()
  
get_capture(Res, Name) -> {ok, json_term()} | not_found

  Same as get_captures(Res, Name, jsx)

get_capture(Res, Name, Backend) ->  {ok, json_term()} | not_found

  Res = run_res()
  Name = string() | binary()
  Backend = backend()

empty_capture_set() -> json_term()

  Same as empty_capture_set(jsx)
  
empty_capture_set(Backend) -> json_term()

  Backend = backend()

Examples

Basics

Expression Match No match Code snippet
42 42 "42", [42], {"key": 42} ejpet:match(<<"42">>, "42").
"42" "42" 42, ["42"], {"key": "42"} ejpet:match(<<"\"42\"">>, "\"42\"").
true true "true", [true] ejpet:match(<<"true">>, "true").
false false "false", [false] ejpet:match(<<"false">>, "false").
null null "null", [null] ejpet:match(<<"null">>, "null").
#"foo" "foobar", "barfoo" "barfo" ejpet:match(<<"\"foobar\"">>, "#\"foo\"").
#"^foo" "foobar" "barfoo" ejpet:match(<<"\"foobar\"">>, "#\"^foo\"").
#"bar$" "foobar" "barfoo" ejpet:match(<<"\"foobar\"">>, "#\"bar$\"").

Objects

Expression Match No match Code snippet
{_:42} {"bar": 42}, {"bar": 47, "foo": 42} {"bar": 47}, {"foo": "42"} ejpet:match(<<"{\"foo\": 42}">>, "{_:42}").
{"foo":_} {"foo": 42}, {"bar": 42, "foo": {}} {"bar": "foo"} ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":_}").
{"foo":42} {"foo": 42}, {"bar": "42", "foo": 42} {"bar": 42, "foo": "42"} ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":42}").
{_:{"foo": 42}, "bar": {_:#"bar"}} {"neh": {"foo": 42}, "bar": {"nimp": "foobar"}} {"neh": {"notfoo": 42}, "bar": {"nimp": "foobar"}} ejpet:match(<<"{\"neh\": {\"foo\": 42}, \"bar\": {\"nimp\": \"foobar\"}}">>, "{_:{\"foo\": 42}, \"bar\": {_:#\"bar\"}}").

Lists

Expression Match No match Code snippet
["42"] ["42"] {"bar": "42"}, {"foo": 42}, [42], ["42", "42"] ejpet:match(<<"[\"42\"]">>, "[\"42\"]").
[*, "42"] ["42"], ["42", "42"], [true, "42"] {"bar": "42"}, {"foo": 42}, [42], ["42", true] ejpet:match(<<"[true, \"42\"]">>, "[*, \"42\"]").
[*, "42", *] ["42"], ["42", "42"], [true, "42"], ["42", true], [{}, "42", true] {"bar": "42"}, {"foo": 42}, [42] ejpet:match(<<"[true, \"42\", {}]">>, "[*, \"42\", *]").
[[42]] [[42]] [42], [[42], 42] ejpet:match(<<"[[42]]">>, "[[42]]").
[*, [42]] [[42]], ["42", [42]] [[42], 42] ejpet:match(<<"[\"42\", [42]]">>, "[*, [42]]").
[[42], *] [[42]], [[42], 42] ["42", [42]] ejpet:match(<<"[[42], \"42\"]">>, "[[42], *]").

Value sets (lists or object value set)

Expression Match No match Code snippet
<42> [42], {"key": 42} 42, "42" ejpet:match(<<"{\"key\": 42}">>, "<42>").
<"42"> ["42"], {"bar": "42"}, [42, "42"], ["42", 42] [42], {"bar": 47}, {"foo": 42} ejpet:match(<<"{\"bar\": \"42\"}">>, "<\"42\">").
<!"42"!> ["42"], [true, "42"], ["foo", ["42", true], {}], [{}, {"foo": "42"}, true], {"bar": "42"}, {"bar": {"foo": "42"}} "42", {"foo": 42}, [42] ejpet:match(<<"[true, [null, {\"foo\": \"42\"}, \"bar\"], {}]">>, "<!\"42\"!>").
<!<!"42"!>!> [["42"]], [{}, {"foo": "42"}, true], {"bar": {"foo": "42"}} ["42"], {"bar": "42"} ejpet:match(<<"[{\"foo\":\"42\"}]">>, "<!<!\"42\"!>!>").

Captures

Expression Test Capture(s) Code snippet
<!(?<subnode>{_:42})!> [{"foo": null}, {"foo": 42, "bar": {}}] subnode: [{"foo":42,"bar":{}}] ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "<!(?<subnode>{_:42})!>").
(?<all><!(?<subnode>{_:42})!>) [{"foo": null}, {"foo": 42, "bar": {}}] all: [[{"foo":null},{"foo":42,"bar":{}}]],subnode: [{"foo":42,"bar":{}}] ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)").

Global captures

Expression Test Capture(s) Code snippet
<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g [{"codec": "audio", "lang": "fr"}, {"codec": "video", "lang": "en"}, {"codec": "foo", "lang": "it"}] node: [{"codec":"audio","lang":"fr"}, {"codec":"video","lang":"en"}, {"codec":"foo","lang":"it"}] lang: ["fr", "en", "it"] ejpet:match(<<"[{\"codec\": \"audio\", \"lang\": \"fr\"}, {\"codec\":\"video\", \"lang\": \"en\"}, {\"codec\": \"foo\", \"lang\": \"it\"}]">>, <<"<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g">>)

Injections

Expression Test parameters Capture(s) Code snippet
<(?<subnode>(!<what>number))> [41, 42, 43] [{<<"what">>, 42}] subnode: [42] ejpet:match(<<"[41, 42, 43]">>, "<(?<subnode>(!<what>number))>", [], [{<<"what">>, 42}]).

Notes

In arrays above, captured values are expressed as "abstract JSON node", for illustration purpose. As explained previously, actual capture result depends on the API function used, and may be:

  • serialized JSON nodes (as in the "Code snippet" column), with ejpet:match()
1> ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)").
{true,<<"{\"all\":[[{\"foo\":null},{\"foo\":42,\"bar\":{}}]],\"subnode\":[{\"foo\":42,\"bar\":{}}]}">>}
  • (jsx | jiffy | mochijson2) JSON value, depending on the backend, for easier further processing, with ejpet:run()
1> JSX = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", jsx, []).
{ejpet,jsx,#Fun<ejpet_jsx_generators.19.98422695>}
2> ejpet:run((ejpet:backend(JSX)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), JSX).
{true,[{"all",
        [[[{<<"foo">>,null}],[{<<"foo">>,42},{<<"bar">>,[{}]}]]]},
       {"subnode",[[{<<"foo">>,42},{<<"bar">>,[{}]}]]}]}

39> Mochi = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", mochijson2, []).
{ejpet,mochijson2,
       #Fun<ejpet_mochijson2_generators.19.110863078>}
40> ejpet:run((ejpet:backend(Mochi)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), Mochi).
{true,{struct,[{<<"all">>,
                [[{struct,[{<<"foo">>,null}]},
                  {struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]]},
               {<<"subnode">>,
                [{struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]}]}}

About

Matching JSON nodes in Erlang

Resources

Stars

Watchers

Forks

Packages

No packages published