Взять готовый или позаимствовать код и обучить на своих даных. - [spacy tokenizer guide](https://machinelearningknowledge.ai/complete-guide-to-spacy-tokenizer-with-examples/) - [gpt2 fast tokenizer source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/tokenization_gpt2_fast.py) - [use transformers.GPT2Tokenizer](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2Tokenizer) - [sberbank rugpt2 with tokenizer vocab.json file](https://huggingface.co/ai-forever/rugpt3large_based_on_gpt2)