Skip to content

gedcom7code/c-converter

Repository files navigation

GEDCOM 5.5.1 to GEDCOM 7.0 converter

GEDCOM 7.0 is a breaking change with GEDCOM 5.5.1. This means that 5.5.1 files cannot be parsed as-is as if they were 7.0 files. This project is a zero-dependency public-domain ANSI-C implementation of a 5.5.1 to 7.0 converter. C was chosen because it as very few features, so it should be able to convert the code to other languages easily; and because many other languages have methods for calling C code natively.

Current status:

  • Single-pass operations
    • Detect character encodings, as documented in ELF Serialisation.
    • Convert to UTF-8
    • Normalize line whitespace, including stripping leading spaces
    • Remove CONC
    • Normalize case of tags
    • Limit character set of cross-reference identifiers
    • Fix @ usage
    • Convert LANG payloads to BCP 47 tags, using FHISO's mapping
    • Convert DATE
      • replace date_phrase with PHRASE structure
      • replace calendar escapes with calendar tags
      • change BC and B.C. to BCE and remove if found in unsupported calendars
      • replace dual years with single years and PHRASEs
      • replace just-year dual years in unqualified date with BET/AND
    • Convert AGE
      • change age words to canonical forms (stillborn as 0y, child as < 8y, infant as < 1y) with PHRASEs
      • Normalize spacing in AGE payloads
      • add missing y
    • Convert MEDI.FORM payloads to media types
    • (deferred) Convert INDI.NAME
      • (deferred) replace /surname/ with name part
      • (deferred) combine payload and parts
      • (deferred) convert _RUFNAME to RUFNAME
    • (deferred) Convert PLAC structures to PLACE records and WHERE pointers thereto
    • Enumerated values
      • Normalize case
      • Convert user-text to PHRASEs
    • change SOUR with text payload into pointer to SOUR with NOTE
    • change NOTE record or with pointer payload into SNOTE
    • change OBJE with no payload to pointer to new OBJE record
    • Convert FONE and ROMN to TRAN and their TYPEs to BCP-47 LANGs
    • tag renaming, including
      • EMAI, _EMAILEMAIL
      • FORM.TYPEFORM.MEDI
      • (deferred) _SDATESDATE -- _SDATE is also used as "accessed at" date for web resources by some applications so this change is not universally correct
      • _UIDUID
      • _ASSOASSO
      • _CRE, _CREATCREA
      • _DATEDATE
      • other?
    • ASSO.RELAASSO.ROLE (changing payload OTHER + PHRASE)
    • change RFN, RIN, and AFN to EXID
    • change _FSFTID, _APID to EXID
    • remove SUBN, HEAD.FILE, HEAD.CHAR
      • (deferred) HEAD.PLAC was originally on this list, but has been deferred to a later version
    • change FILE payloads into URLs
      • Windows-style \ becomes /
      • Windows diver letter C:\WINDOWS becomes file:///c:/WINDOWS
      • POSIX-stye /User/foo becomes file:///User/foo
    • update the GEDC.VERS to 7.0
    • (extra) change string-valued INDI.ALIA into NAME with TYPE AKA
    • (5.5) change base64-encoded OBJE into GEDZIP
    • Change any illegal tag XYZ into _EXT_XYZ
  • two-pass operations

Usage

Building using the Makefile

Edit Makefile as needed; likely changes include

  • Change from CC := clang to your C compiler
  • If on Windows, change the target from ged5to7 to ged5to7.exe

Then run make.

Building using Visual Studio

To instead build using Visual Studio, simply open the c-converter.sln file with Visual Studio and build the solution normally.

Running

To run, execute the resulting ged5to7. Run ged5to7 --help for a list of command-line options.

Design Notes

The code is designed to be thread-safe (no mutable globals or static locals) though threading has not yet been added.

The code is currently first-draft status by someone who usually does not write large code bases others read. It has inconsistent naming (e.g., ged_destroy_event vs changePayloadToDynamic), some shortcuts (e.g., some structs are allocated as longs and cast to struct), inconsistent style (e.g., three different ways to emit locally-created GedEvents), etc. In some places some energy was spent making it efficient, in other places it is definitely not as efficient as it easily could be. Overall, the code needs a major refactor before it is easy to read.

About

Converts GEDCOM 5.5.1 to GEDCOM 7

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •