Mini-Project 4 - Task 1
Alexandre Banon; Vincent Delmas; Michael Haaf
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embed-dings. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
<Sentence_embedding.py> contains our reproduction of the algorithm 1 from the paper (Arora et al.,2017).
<function.py> and <frequence.py> contain useful functions for the main algorithm.
<figure_2_a.py> contains a script to produce the figure 2 of our report.
<environment.py> check if the work space is well organized, and shares the link to dowload the datasets.
numpy, time, sys, os, math, nltk, codecs, pandas, csv, matplotlib
STS (2012 to 2015) and SICK 2014
GloVe embedding pre-trained vectors
PSL embedding pre-trained vectors
enwiki database from Wikipedia articles
[a] A number, parameter of the weight factor of the method proposed by the paper
[task] Possible values : "STS 2012", "STS 2013", "STS 2014", "STS 2015", "SICK 2014"
[methode] Possible values : "WR", "avg", "bin", "g", "h"
[word_embedding] Possible values : "GloVe", "PSL"