-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Extension Tags Converted to URIs During Parsing
Bug Description
When parsing GEDCOM 7 files with extension tags declared in SCHMA, the parser converts the extension tag names to their URI values, making it impossible to identify which extension tag was used.
Steps to Reproduce
import gedcom7
test_gedcom = """0 HEAD
1 GEDC
2 VERS 7.0
1 SCHMA
2 TAG _TEST https://example.com/test
0 @I1@ INDI
1 NAME John /Doe/
1 _TEST Some test data
0 TRLR"""
structures = list(gedcom7.loads(test_gedcom))
for struct in structures:
if struct.tag == "INDI":
for child in struct.children:
print(f"Tag: {child.tag}")Expected Behavior
The child tag should be _TEST (the actual tag used in the file).
Actual Behavior
The child tag is https://example.com/test (the URI from the SCHMA declaration).
Impact
This makes it impossible to:
- Identify which extension tag was actually used in the file
- Process extension tags differently based on their tag name
- Export GEDCOM 7 files with the same extension tags that were imported
Additional Information
- Version: gedcom7 0.4.0
- Python: 3.13.5
- This behavior is particularly problematic when multiple extension tags map to the same URI for different purposes
Suggested Fix
The parser should preserve the original tag name, perhaps storing both the tag and its URI:
tagproperty: The actual tag from the file (e.g., "_TEST")uriortype_idproperty: The URI from SCHMA (e.g., "https://example.com/test")
This would maintain backward compatibility while allowing access to the original tag name.
Metadata
Metadata
Assignees
Labels
No labels