-
Notifications
You must be signed in to change notification settings - Fork 5
Description
作者你好,感谢对论文代码的开源。我尝试复现你的工作,修改少量代码(文件路径,ground_truth预处理,一些device的bug,关闭fp16)后可以运行训练,但训练中发现准确率只有不到50%,猜测有可能是我在数据预处理或者pytorch版本设置上和预期不一致。请问能否公布详细的代码运行流程便于进行复现,谢谢!
十分感谢对我们工作的关注。
主要关注以下几个点:
- 训练参数是否一致
- 预训练模型是否一致,我们用的是OpenCLaP
- 是否应用了数据增广
- NVIDIA Apex能够降低内存占用,同时加快训练速度,建议开启
- 我们当时训练时使用的CUDA为10.2,PyTorch版本为1.4.0
作者您好,我与上个题主遇到的情况相同,训练结果是47%左右,且6个epoch结果基本保持一致不再收敛。另外确认了您列出的五个点,并检查一致。下面是训练日志,请问作者有什么建议和看法嘛?
2020-11-27 06:57:10 - train model - INFO - Algorithm: LFESM
2020-11-27 06:57:10 - train model - INFO - ***** Start training *****
2020-11-27 06:57:10 - train model - INFO - Dataset: ./data/raw/CAIL2019-SCM-big/SCM_5k.json
2020-11-27 06:57:10 - train model - INFO - Device: cuda GPU Num: 2
2020-11-27 06:57:10 - train model - INFO - Config: {
"batch_size": 3,
"epochs": 6,
"fp16": false,
"fp16_opt_level": "O1",
"learning_rate": 2e-05,
"max_grad_norm": 1.0,
"max_length": 512,
"warmup_steps": 0.1
}
2020-11-27 06:57:21 - train model - INFO - Num examples = 10185
2020-11-27 06:57:21 - train model - INFO - Batch size = 3
2020-11-27 06:57:21 - train model - INFO - Num steps = 20370
2020-11-27 08:02:42 - train model - INFO - Epoch 1, train Loss: 0.2109890, eval acc: 0.44466666666666665, eval loss: 0.1182363, test acc: 0.4759114583333333, test loss: 0.1140847
2020-11-27 09:07:45 - train model - INFO - Epoch 2, train Loss: 0.2052897, eval acc: 0.44066666666666665, eval loss: 0.1631643, test acc: 0.4759114583333333, test loss: 0.1540094
2020-11-27 10:12:46 - train model - INFO - Epoch 3, train Loss: 0.2257743, eval acc: 0.442, eval loss: 0.1747637, test acc: 0.4772135416666667, test loss: 0.1637522
2020-11-27 11:17:51 - train model - INFO - Epoch 4, train Loss: 0.2261967, eval acc: 0.45266666666666666, eval loss: 0.1677068, test acc: 0.48828125, test loss: 0.1562569
2020-11-27 12:22:51 - train model - INFO - Epoch 5, train Loss: 0.2206582, eval acc: 0.456, eval loss: 0.1648252, test acc: 0.48828125, test loss: 0.1548795
2020-11-27 13:27:46 - train model - INFO - Epoch 6, train Loss: 0.2065234, eval acc: 0.456, eval loss: 0.1655720, test acc: 0.4889322916666667, test loss: 0.1561106
2020-11-27 13:27:46 - train model - INFO - ***** Training complete *****