This repository contains layout analysis and OCR from 17th books in TEI files and their ODD.
Thoses files were created thanks to a pipeline :
An ODD is available. It is based on ODD17. The main modifications are :
teiHeadernow contains
- metadata obtained in the manifest IIIF and with SPARQL request
- Additional information about the printer
- Additional information about the use of the SegmOnto vocabulary.
-
facsimilenow contains all layout informations about different zones, lines, and baselines, with pixels coordinates and links to IIIF images. -
textin which is all transcription, linked to the relevant line/zone
Documents have been encoded by Claire Jahan with the help of Simon Gabay, as part of the E-ditiones project.
Claire Jahan : claire.jahan[at]chartes.psl.eu
Simon Gabay : Simon.Gabay[at]unige.ch
Claire Jahan, Simon Gabay. 2020. CORPUS17+ - Corpus of TEI encoded 17th French prints., Paris/Geneva: ENS Paris/UniGE, 2021, https://github.com/e-ditiones/CORPUS17plus.
