We provide all environment configurations in environment.txt. To install all packages, you can create a conda environment and install the packages as follows:
conda create -n lemma python=3.8
conda activate lemma
pip install -r environment.txt
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113 In our experiments, we used NVIDIA CUDA 11.3 on Ubuntu 20.04. Similar CUDA version should also be acceptable with corresponding version control for torch and torchvision.
For data, features and data download, please refer to our website. Within the google drive, you can find features and model checkpoints under features/ and checkpoints/ respectively.
After download, create data/ under the current directory by
$ cd lemma_simple_model
$ mkdir data
$ mkdir data/hcrn_dataNext, put the features, data and checkpoitns to subdirectories as follows:
- download and put
features/video_feature_20.h5todata/. - download and put
features/lemma-qa_appearance_feat.h5,features/lemma-qa_motion_feat.h5todata/hcrn_data/. - download
features/video_features.zipand unzip it to$FEATURE_BASE_PATH. - download
features/glove.840.300d.pklto$GLOVE_PT_PATHand setglove_pt_pathto$GLOVE_PT_PATHinpreprocess/generate_glove_matrix.py. - download and put
data/train_qas.json,data/test_qas.json,data/val_qas.json,data/tagged_qa.json,data/vid_intervals.jsonto$BASE_DATA_DIR
After downloading all data to their correct locations, run the following for preprocessing:
$ chmod a+x PREPROCESS.sh
$ ./PREPROCESS.sh $BASE_DATA_DIRThis script will run the following preprocess for features and texts:
-
$ python preprocess/preprocess_vocab.py
This will generate
lemma-qa_vocab.json. -
$ python preprocess/mode_qas2mode_qas_encode.py
This will convert {mode}_qas.json,lemma-qa_vocab.json to {mode}_qas_encode.json, answer_set.txt, vocab.txt.
-
$ python preprocess/generate_glove_matrix.py
Before running
PREPROCESS.sh, please make sure that theglove_pt_pathis correctly set. This script will generateglove.pt. -
$ python preprocess/generate_char_vocab.py
This script will generate
char_vocab.txt. -
$ python preprocess/format_mode_qas_encode.py {mode}Before running the experiments, please make sure that
max_word_leninpreprocess/format_mode_qas_encode.pyis equal toargs.char_max_lendefined intrain_psac.py. Similary, make sure thatmax_sentence_leninpreprocess/format_mode_qas_encode.pyis equal toargs.max_lenintrain_psac.py,train_linguistic_bert.pyandtrain_visual_bert.py. -
$ python preprocess/reasoning_types.py
THis will generate
all_reasoning_types.txt.
To train the model from scratch we provide the following model files:
train_hcrn.py: HCRN experimenttrain_hga.py: HGA experimenttrain_hme.py: HME experimenttrain_linguistic_bert.py: BERT experimenttrain_psac.py: PSAC experimenttrain_pure_lstm.py: LSTM experiment (addtional LSTM and CNN-LSTM experiment)train_visual_bert.py: VisualBERT experiment
Use the following command and substitute $TRAIN_MODEL_PY to the model you want to experiment with:
$ python $TRAIN_MODEL_PY --base_data_dir $BASE_DATA_DIRfor models $TRAIN_MODEL_PY in train_hcrn.py, train_hme.py, train_hga.py (you might also want to change the app_feat_path, motion_feat_path and video_feat_path in these files for adjusting the feature path) and
$ python $TRAIN_MODEL_PY --feature_base_path $FEATURE_BASE_PATH --base_data_dir $BASE_DATA_DIRfor models $TRAIN_MODEL_PY in train_psac.py, train_pure_lstm.py, train_linguistic_bert.py, train_visual_bert.py.
For bert-based model, you need to set BertTokenizer_CKPT and BertModel_CKPT for the model to load pretrained model from huggingface.
-
For linguistic_bert, set BertTokenizer_CKPT="bert-base-uncased", BertModel_CKPT="bert-base-uncased".
-
For visual_bert, set BertTokenizer_CKPT="bert-base-uncased", VisualBertModel_CKPT="uclanlp/visualbert-vqa-coco-pre".
To reload checkpoints and only run inference on test_qas, run the following command:
$ python $TRAIN_MODEL_PY --base_data_dir $BASE_DATA_DIR --reload_model_path $RELOAD_MODEL_PATH --test_only 1for models $TRAIN_MODEL_PY in train_hcrn.py, train_hme.py, train_hga.py and
$ python $TRAIN_MODEL_PY --feature_base_path $FEATURE_BASE_PATH --base_data_dir $BASE_DATA_DIR --reload_model_path $RELOAD_MODEL_PATH --test_only 1for models $TRAIN_MODEL_PY in train_psac.py, train_pure_lstm.py, train_linguistic_bert.py, train_visual_bert.py.
This code heavily used resources from VisualBERT, HCRN, HGA, HME, PSAC. We thank the authors for open-sourcing their awesome projects.