Language models have demonstrated remarkable versatility across various fields, with their extensive knowledge and reasoning capabilities showing promise for enhancing recommendation tasks. Researchers have found that fine-tuning language models for downstream tasks can further boost their effectiveness; however, applying this to large-scale recommendation data is often prohibitively time-consuming due to the volume of user-item interactions and the complexity of language models. Therefore, this repo aims to provide an efficient fine-tuning method to quickly enhance the PLM's compatibility with recommendation data while leveraging texts to improve CTR prediction performance.
For more details, please refer to our paper.
Clone this repo and set DATA_MOUNT_DIR=[DOWNLOAD_PATH]/data in your environment.
Download Amazon Sports dataset from here and then process data:
python build_dataset.py amazon-sportsID-Based Model
python run_ctr.py amazon-sportsPre-training LM
set model_name_or_path in config/mlm.yaml and then
python script/run_mlm.py amazon-sportsFine-tuning LM
set ctr_model/pretrained_dir in config/align.yaml and then
python script/run_align.py amazon-sports [PRE_TRAINED_LM_PATH]Training recommendation backbone
python script/run_cotrain.py amazon-sports [OUPUT_PATH] [FT_LM_PATH] [TOKENIZER_PATH]If you find this project useful in your research, please cite our research paper:
@article{wang2024cela,
title={CELA: Cost-Efficient Language Model Alignment for CTR Prediction},
author={Wang, Xingmei and Liu, Weiwen and Chen, Xiaolong and Liu, Qi and Huang, Xu and Yichao, Wang and Li, Xiangyang and Wang, Yasheng and Dong, Zhenhua and Lian, Defu and Tang, Ruiming},
journal={arXiv preprint arXiv:2405.10596},
year={2024}
}
