Skip to content

Tokenizer evaluation #146

@joenaess

Description

@joenaess

Felix: I think it's definitely a good idea to decide on a preliminary vocab size and a reasonable vocab size range, but I would also like to vary the vocab size and investigate the effect on what is called internal and external performance below. Similar to what I have done in the past: [https://arxiv.org/abs/2304.14780]

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions