Skip to content

Conversation

@thammegowda
Copy link
Owner

@thammegowda thammegowda commented Mar 12, 2025

New features

  • Support datasets from huggingface (e.g. wmt24++)
  • Preliminary support for meta fields, including doc_id, domain, seg_id
  • mtdata score supports QE scoring. Currently supports pymarian, however designed to integrate other scorers via CLI.
  • mtdata.map is an optimized subprocess based mapper that can be used to score many small files using a single subprocess, thus reducing number of times a QE model is loaded

House keeping

  • setup.py -> pyproject.toml; support optional dependencies
  • huggingface datasets as optional dependency
  • mtdata index sub command; deprecate mtdata --re-index <cmd>

@thammegowda thammegowda merged commit 35c7a6f into main Mar 31, 2025
8 of 10 checks passed
@thammegowda thammegowda changed the title [WIP] v0.4.3 : WMT25 v0.4.3 : WMT25 Mar 31, 2025
@thammegowda thammegowda deleted the wmt25 branch April 27, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants