Skip to content

chenyuanxing/n-gram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

n-gram

a simple 2-gram project 这是一个简单的基于2-gram的分词练习

语料库文件名为 北大(人民日报)语料库199801.txt  
文件编码格式为无BOM的utf-8

测试集为 testset.txt  
文件编码格式为无BOM的utf-8

算法在测试数据集上的切分结果在 2017110758.txt文件里
文件编码格式是gbk

完整算法代码为 test.py 

About

a simple 2-gram project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages