Parallel corpus (diplomatic vs normalised) of 17th c. French texts
- WARNING: This repository is now deprecated. See https://github.com/FreEM-corpora/FreEMnorm for the new repo.The corpus is available in the corpus_tsv folder.
A detailed list of the content is available here.
Transcripts are almost diplomatic. Long ſ is maintained ( plaiſir and not plaisir). Ligatures which have disappeared ( ſt, st, ct) are not kept, but not those that are maintained in contemporary French (œ, æ).
To use the tool, you need to:
- Download and install NMTPYTORCH (you can find some help here).
- Download the
NORM17folder from this repository - Prepare the text running
./prepare_data.bash <NAME_OF_FILE> - Normalise the text running
./translate_file.bash <NAME_OF_FILE.tok>
If you want to contribute, you can do so by cloning the repository and sending us a pull request, or by sending an email at simon.gabay[at]unige.ch.
Additional data and corrections have been provided by Philippe Gambette (GitHub) and Jonathan Poinhos.
Please use one of the following publications, preferably the latest.
@inproceedings{gabay:hal-02276150,
TITLE = {{A Workflow For On The Fly Normalisation Of 17th c. French}},
AUTHOR = {Gabay, Simon and Riguet, Marine and Barrault, Lo{\"i}c},
URL = {https://hal.archives-ouvertes.fr/hal-02276150},
BOOKTITLE = {{DH2019}},
ADDRESS = {Utrecht, Netherlands},
ORGANIZATION = {{ADHO}},
YEAR = {2019},
MONTH = Jul,
KEYWORDS = {17th Century France ; Parallel corpus building},
PDF = {https://hal.archives-ouvertes.fr/hal-02276150/file/DH2019_final.pdf},
HAL_ID = {hal-02276150},
HAL_VERSION = {v1},
}@inproceedings{gabay:hal-02596669,
TITLE = {{Traduction automatique pour la normalisation du fran{\c c}ais du XVII e si{\`e}cle}},
AUTHOR = {Gabay, Simon and Barrault, Lo{\"i}c},
URL = {https://hal.archives-ouvertes.fr/hal-02596669},
BOOKTITLE = {{TALN 2020}},
ADDRESS = {Nancy, France},
ORGANIZATION = {{ATALA}},
SERIES = {27{\`e}me Conf{\'e}rence sur le Traitement Automatique des Langues Naturelles},
YEAR = {2020},
MONTH = Jun,
KEYWORDS = {Normalisation ; 17th c French ; Neural Machine Translation (NMT) ; Statistical Machine Translation (SMT) ; Digital humanities ; Humanit{\'e}s num{\'e}riques ; Fran{\c c}ais classique ; Traduction automatique neuronale ; Traduction automatique statistique},
PDF = {https://hal.archives-ouvertes.fr/hal-02596669/file/main.pdf},
HAL_ID = {hal-02596669},
HAL_VERSION = {v1},
}Please keep me posted if you use this data! simon.gabay[at]unige.ch
simon.gabay[at]unige.ch
This work is licensed under a Creative Commons Attribution 4.0 International Licence.
