-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
In Coptic Scriptorium, we have attributes that can apply to different XML elements. For example, we have xml:lang on either <norm.. or <morph like this: (made up example)
<norm_group orig_group="ⲡⲁⲅⲅⲉⲗⲟⲥ" norm_group="ⲡⲁⲅⲅⲉⲗⲟⲥ">
<norm xml:id="u1" pos="ART" lemma="ⲡ" func="det" head="#u2" orig="ⲡ" norm="ⲡ">
ⲡ
</norm>
<norm xml:id="u2" pos="N" lemma="ⲁⲅⲅⲉⲗⲟⲥ" xml:lang="Greek" func="root" orig="ⲁⲅⲅⲉⲗⲟⲥ" norm="ⲁⲅⲅⲉⲗⲟⲥ">
ⲁⲅⲅⲉⲗⲟⲥ
</norm>
</norm_group>
<norm_group orig_group="ⲛⲧⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ" norm_group="ⲛⲧⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ">
<norm xml:id="u3" pos="CREL" lemma="ⲉⲧⲉⲣⲉ" func="mark" head="#u4" orig="ⲛⲧ" norm="ⲛⲧ">
ⲛⲧ
</norm>
<norm xml:id="u4" pos="N" lemma="ⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ" func="acl" head="#u2" orig="ⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ" norm="ⲙⲛⲧⲁⲅⲅⲉⲗⲟⲥ">
<morph morph="ⲙⲛⲧ">
ⲙⲛⲧ
</morph>
<morph xml:lang="Greek" morph="ⲁⲅⲅⲉⲗⲟⲥ">
ⲁⲅⲅⲉⲗⲟⲥ
</morph>
</norm>
</norm_group>
The desired EtherCalc behavior with this NLP output is to collapse both xml:lang annotations into one 'lang' column, but this behavior is hard wired in ether.py. Conceivably, another project would want the normal output columns: morph_xml_id and norm_xml_id to be distinguished.
The names of annotations being collapsed this way should be configurable.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels