Skip to content

BPE the identifier dataset and publish the new model #87

@vmarkovtsev

Description

@vmarkovtsev

As discussed on the reading club, we should run https://github.com/google/sentencepiece on our identifiers dataset and produce a nice compact ASDF in Modelforge to embed any identifier.
That model can be further integrated somewhere near our ID splitter.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions