Skip to content

RayXu14/DR-BERT

Repository files navigation

The sources codes of DR-BERT and baselines

Recent Activity

  1. Our released RRS corpus and Crawled Douban Nonparalell corpus can be found here.
  2. Our released BERT-FP post-training checkpoint for the RRS corpus can be found here.
  3. Our post-training and fine-tuning checkpoints on Ubuntu, Douban, E-commerce, and our released RRS datasets are released here. Feel free to reproduce the experimental results in the paper.

How to Use

  1. prepare the environment

    x注1:建议使用python3.7,否则可能出现问题,比如python3.8安装faiss出问题

    x注2:需要手动根据对应的CUDA版本安装pytorch,否则可能会莫名中途报出CUDA side的错误

    x注3:需要手动安装nlg_eval

    1. 下载nlg_eval原包

    2. 可能会遇到问题,解决方案如下

    diff -r nlg-eval-2.3.0.bk/setup.py nlg-eval-2.3.0/setup.py
    24c24
    <     reqs = [str(ir.req) for ir in install_reqs]
    ---
    >     reqs = [str(ir.requirement) for ir in install_reqs]
    
    pip install -r requirements.txt
  2. init the repo

    Before using the repo, please run the following command to init:

    # create the necessay folders
    python init.py

    x注:修改config/base.yaml中的root_dir内容可以更改存放数据和记录的位置

  3. train the model

    The necessary details can be found under the config folder.

    # dataset_name: douban, ecommerce, ubuntu, restoration-200k
    # model_name: dual-bert(DR-BERT), bert-ft, sa-bert, bert-fp(post-training), poly-encoder
    ./scripts/train.sh <dataset_name> <model_name> <cuda_ids>
  4. test the model

    ./scripts/test_rerank.sh <dataset_name> <model_name> <cuda_id>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published