This is supplementary data and code to help reproduce the results of a paper of the same name. It is currently available in somewhat unpolished form, but will be fully documented in the near future.
-
populations: contains .tsv files with language-population data and other population-relevant data -
task_results: contains .tsv files with the aggregated results for each NLP task. -
economic_indicators_data: contains files from WITS (underwits_en_trade_summary_allcountries_allyears) and converted to map to languages instead of countries. Important files are:languages_to_gdp.tsvfor monolingual mapping of languages to associated GDP estimations.bilingual_indicators.tsvfor bilingual mapping of languages to associated bilingual indicators (Imports, Exports) estimations. Also includes the triangulated BLEU scores for the language pair.
-
figs: contains correlation figures, created with theplot_*_correlations.pyscripts. -
area-classifier: contains data and code for a classifier of areas
counterfactuals.py: computes the counterfactual scenarios presented in the paperconstants.pycontains functions to read in all necessary data, which are used in other files to run the metrics estimations and produce the plotseconomic_indicators.py: contains function to read in economic indicators (called byconstants.py)
- Add general metric calculation script
- Data paths are all absolute, need to correct this