data/raw/ contains data_stage_one data files from the convote dataset. They are seperated into train/, test/, and dev/. They are separated by speaker, and the political party of the speaker is 'P' in the name of the file: ([0-9]\_)+P\w\w.txt, where 'R' represents the Republican party, and 'D' represents the Democratic party.
The data can also be downloaded here.
Project requires nltk and spacy
pip install -r requirements.txt
Tag the data with:
python tag_raw_dataset.py --tagger <tagger_name>
Valid tagger names include hmm, perceptron, and spacy.
To run the perceptron on the untagged data, run:
python perceptron.py <num_iterations>