Skip to content

Layout analysis and OCR from 17th books in TEI files and their ODD.

Notifications You must be signed in to change notification settings

e-ditiones/CORPUS17plus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CORPUS17+ - Corpus of TEI encoded 17th French prints

This repository contains layout analysis and OCR from 17th books in TEI files and their ODD.

Pipeline

Thoses files were created thanks to a pipeline :

  1. OCR17+ for HTR/OCR
  2. extractor for transformation ALTO->TEI

TEI documentation

An ODD is available. It is based on ODD17. The main modifications are :

  1. teiHeader now contains
  • metadata obtained in the manifest IIIF and with SPARQL request
  • Additional information about the printer
  • Additional information about the use of the SegmOnto vocabulary.
  1. facsimile now contains all layout informations about different zones, lines, and baselines, with pixels coordinates and links to IIIF images.

  2. text in which is all transcription, linked to the relevant line/zone

Credits

Documents have been encoded by Claire Jahan with the help of Simon Gabay, as part of the E-ditiones project.

Contact

Claire Jahan : claire.jahan[at]chartes.psl.eu

Simon Gabay : Simon.Gabay[at]unige.ch

Licence

This repository is CC-BY.
Creative Commons License

Cite this repository

Claire Jahan, Simon Gabay. 2020. CORPUS17+ - Corpus of TEI encoded 17th French prints., Paris/Geneva: ENS Paris/UniGE, 2021, https://github.com/e-ditiones/CORPUS17plus.

About

Layout analysis and OCR from 17th books in TEI files and their ODD.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published