Skip to content

Conversation

@snej
Copy link
Contributor

@snej snej commented Jan 22, 2025

JSONSchema class parses a JSON Schema and validates Fleece Values against it. See header file for class documentation. You may want to look at the JSON Schema docs / tutorial.

Other Fleece API changes supporting this:

  • Added slice::UTF8Length(), which counts the number of UTF-8 code points
  • Added FLEvalJSONPointer(), a wrapper around fleece::impl::Path::evalJSONPointer()
  • Added FLDictIterator_BeginSK(), exposing the existing ability to pass a SharedKeys to a Dict::iterator

Bug fixes:

  • Fixed some limitations of fleece::impl::Path::evalJSONPointer() (it didn't handle escaped keys)
  • Fixed a SharedKeys issue with empty-string keys

Optimizations:

  • Optimized Dict lookup from a Dict::key (aka FLDictKey), mostly by avoiding locating the Dict's SharedKeys unless we have to, since that's very slow.

@snej snej changed the title JS JSON Schema validator Jan 22, 2025
@snej snej marked this pull request as draft January 22, 2025 20:14
@snej snej force-pushed the feature/schema branch 3 times, most recently from 4073d2c to f3f59a6 Compare January 23, 2025 20:35
snej added 4 commits January 29, 2025 10:05
I did this so the JSON Schema benchmark could be fully optimized.
Code adapted from Litecore's StringUtils.cc
- FLEvalJSONPointer() exposes impl::Path::evalJSONPointer().
- It now properly handles an empty path.
- It now properly handles a path with a trailing "/".
- It now properly handles those weird "~" escapes (see RFC 6901.)
Exposes existing internal method.
snej added 5 commits January 29, 2025 11:24
As an optimization, allow a Dict iterator to be created with its
SharedKeys already known; saves a lookup.
The jsonsl parser we use follows an older version of the JSON spec
which didn't allow a document to be a scalar value, only an array or
object.

I've run into this issue a few times lately -- for example, JSON
Schema considers `true` or `false` a valid schema, but we can't
parse it, so a few tests in their test suite break.

I made a small change to JSONConverter to detect when the input isn't
an array or object, and wrap it in `[...]` so that jsonsl can parse
it. (But meanwhile it ignores the outer array when parsing.)
A weird edge case that showed up when running the JSON Schema test
suite, which includes some JSONPointer paths with empty-string
components.

The bug is that `_table.find(str)` finds the entry for the empty
string, but the following test `entry.key != nullslice` fails
because an empty slice compares as == nullslice since they're both
empty. The proper way to test whether a slice is not nullslice is
to use `if(entry.key)`, which doesn't work here because
`__usuallyTrue` requires a boolean, or `entry.key.buf != nullptr`.
There's no reason for it to be an instance method, and the optimizations
I'm making to Dict require calling it without having a SharedKeys
instance.
These came about from profiling the JSONSchema validator, and speed
up the travel-sample validation benchmark by about 10%.

(1) `DictImpl::get(int)` uses a binary search, but for small Dicts
it's faster to just scan all the keys, especially since we can
precompute what the two bytes of the matching key must be.

(2) `DictImpl::get(slice,SharedKeys*)` can skip looking up the Dict's
SharedKeys (which is slow) if the key isn't alphanumeric since it
won't ever be a shared key.

(2) `DictImpl::get(Dict::key&)` shouldn't waste any time looking up
sharedKeys if it already knows the numeric key.
I also changed its boolean `_hasNumericKey` flag into an int8_t with
three states: 0 for unknown, 1 for true, and -1 for "can't be a
shared key". This latter state avoids trying to look up the key
every single time when it won't ever succeed.
@snej snej force-pushed the feature/schema branch 2 times, most recently from 3037b5b to 85bd947 Compare January 29, 2025 23:16
@snej snej marked this pull request as ready for review May 1, 2025 16:34
@snej snej requested review from callumbirks and jianminzhao May 1, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants