Skip to content

Comments

Rewrite prefix generation and replacement logic#1539

Open
kmcginnes wants to merge 3 commits intoaws:mainfrom
kmcginnes:feature/schema/rdf-namespace-duplicates
Open

Rewrite prefix generation and replacement logic#1539
kmcginnes wants to merge 3 commits intoaws:mainfrom
kmcginnes:feature/schema/rdf-namespace-duplicates

Conversation

@kmcginnes
Copy link
Collaborator

@kmcginnes kmcginnes commented Feb 24, 2026

Description

  • Add splitIri, generatePrefix, and commonPrefixes utility modules
  • Add PrefixLookup class for fast Map-based prefix lookups
  • Rewrite generatePrefixes to use splitIri and generatePrefix
  • Rewrite replacePrefixes to use PrefixLookup and splitIri
  • Change usePrefixes/prefixesAtom to return PrefixLookup
  • Remove __matches from PrefixTypeConfig
  • Simplify saveConfigurationToFile and useImportConnectionFile
  • Ensure generated prefix names are unique by appending a numeral on collision (e.g. soccer, soccer2, soccer3)
  • Generate prefixes from vertex and edge IDs, not just types and attribute names
  • Validate IRI local values against RDF rules, rejecting values with invalid characters (spaces, angle brackets, curly braces, pipes, carets, backticks, backslashes)
  • Add backward compatibility tests for legacy __matches data

Validation

  • All existing tests pass
  • New unit tests cover splitIri, generatePrefix, PrefixLookup, commonPrefixes, prefix uniqueness, entity ID prefix generation, local value validation, and backward compatibility with legacy data

Prefix generation examples:

Rule IRI Prefix
Single path segment http://www.example.com/soccer/ontology/League soccer
Host fallback (no meaningful segments) http://www.schema.org/City schema
Skips resource segment http://data.nobelprize.org/resource/country/France country
Skips class segment https://dbpedia.org/class/yago/Record106647206 yago
Skips numeric-only segments http://example.org/2024/01/schema#Thing schema
Multi-segment: first word + abbreviation http://kelvinlawrence.net/air-routes/datatypeProperty/name airdp
Multi-segment: camelCase abbreviation http://kelvinlawrence.net/air-routes/objectProperty/route airop
Special characters stripped (first word) http://example.org/my-special_ns.v2/Item my
Collision avoidance (1st namespace) http://www.example.com/soccer/ontology/League soccer
Collision avoidance (2nd namespace) http://www.example.com/soccer/resource#EPL soccer2
Collision avoidance (3rd namespace) http://www.example.com/soccer/class#Team soccer3

Local value validation (accepted):
my_item, my-item, v2.0, caf%C3%A9, café, item·1

Local value validation (rejected → no prefix generated):
my item, a<b, a{b}, a|b, a^b, a`b, a\b

Related Issues

Check List

  • I confirm that my contribution is made under the terms of the Apache 2.0
    license.
  • I have run pnpm checks to ensure code compiles and meets standards.
  • I have run pnpm test to check if all tests are passing.
  • I have covered new added functionality with unit tests if necessary.
  • I have added an entry in the Changelog.md.

@kmcginnes
Copy link
Collaborator Author

kmcginnes commented Feb 24, 2026

  • Add de-duplication logic at the generatePrefixes level
  • Add logic to ensure the splitIri logic rejects invalid local values

- Add splitIri, generatePrefix, commonPrefixes utility modules
- Add PrefixLookup class for fast Map-based prefix lookups
- Rewrite generatePrefixes to use splitIri and generatePrefix
- Rewrite replacePrefixes to use PrefixLookup and splitIri
- Change usePrefixes/prefixesAtom to return PrefixLookup
- Remove __matches from PrefixTypeConfig
- Simplify saveConfigurationToFile and useImportConnectionFile
- Add backward compatibility tests for legacy __matches data
@kmcginnes kmcginnes force-pushed the feature/schema/rdf-namespace-duplicates branch from bfa1d4f to 63011a9 Compare February 24, 2026 14:56
@kmcginnes kmcginnes force-pushed the feature/schema/rdf-namespace-duplicates branch from 82efdf6 to 9dc2a68 Compare February 24, 2026 20:32
- Reject IRIs whose local value contains characters invalid in RDF
  prefixed names: spaces, angle brackets, curly braces, pipes, carets,
  backticks, and backslashes
- Allow underscores, hyphens, periods, percent-encoded sequences,
  Unicode letters, and middle dots
- Remove duplicate tests that were outside the describe block
@kmcginnes kmcginnes marked this pull request as ready for review February 24, 2026 21:26
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 97.24138% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.41%. Comparing base (c5affc5) to head (1f447ac).
⚠️ Report is 74 commits behind head on main.

Files with missing lines Patch % Lines
...es/graph-explorer/src/core/StateProvider/schema.ts 92.85% 1 Missing ⚠️
...ges/graph-explorer/src/utils/rdf/generatePrefix.ts 97.91% 1 Missing ⚠️
...s/graph-explorer/src/utils/rdf/generatePrefixes.ts 96.29% 1 Missing ⚠️
packages/graph-explorer/src/utils/rdf/splitIri.ts 96.15% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1539       +/-   ##
===========================================
+ Coverage   47.81%   65.41%   +17.59%     
===========================================
  Files         382      359       -23     
  Lines        8525     7968      -557     
  Branches     3159     2900      -259     
===========================================
+ Hits         4076     5212     +1136     
+ Misses       3070     1991     -1079     
+ Partials     1379      765      -614     
Flag Coverage Δ
unittests 65.41% <97.24%> (+17.59%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


/** Abbreviates a segment to the first letter of each word. */
function abbreviate(segment: string): string {
const words = splitSegmentIntoWords(segment);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this entire function can be summarized as:

return sanitizePrefix(
  splitSegmentIntoWords(segment)
    .map(w => w.charAt(0))
    .join("")
)

imho, it's easier to reason about it if it's written this way, because these two returns don't look the same semantically. Makes me even question — do we not sanitizePrefix if more than one word?

}

return updatedPrefixes.values().toArray();
let i = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo worth leaving a comment wrt the magic number

}

// Reject URIs without a real origin (e.g. urn:, mailto:, custom-scheme:)
if (!url.origin || url.origin === "null") {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!url.origin || is redundant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SPARQL namespaces seem to be reused

2 participants