- Our released RRS corpus and Crawled Douban Nonparalell corpus can be found here.
- Our released BERT-FP post-training checkpoint for the RRS corpus can be found here.
- Our post-training and fine-tuning checkpoints on Ubuntu, Douban, E-commerce, and our released RRS datasets are released here. Feel free to reproduce the experimental results in the paper.
-
prepare the environment
x注1:建议使用python3.7,否则可能出现问题,比如python3.8安装faiss出问题
x注2:需要手动根据对应的CUDA版本安装pytorch,否则可能会莫名中途报出CUDA side的错误
x注3:需要手动安装nlg_eval
diff -r nlg-eval-2.3.0.bk/setup.py nlg-eval-2.3.0/setup.py 24c24 < reqs = [str(ir.req) for ir in install_reqs] --- > reqs = [str(ir.requirement) for ir in install_reqs]pip install -r requirements.txt
-
init the repo
Before using the repo, please run the following command to init:
# create the necessay folders python init.pyx注:修改config/base.yaml中的root_dir内容可以更改存放数据和记录的位置
-
train the model
The necessary details can be found under the
configfolder.# dataset_name: douban, ecommerce, ubuntu, restoration-200k # model_name: dual-bert(DR-BERT), bert-ft, sa-bert, bert-fp(post-training), poly-encoder ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
-
test the model
./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>