This is a query recommendation system using Jaccard Similarity and collaborative filtering to recommend queries to users based on the previous rated queries. Also a solution is provided to rate a query generally for all users. This project was done as part of the course "Data Mining" at the University of Trento by Seyed Mohammad Mousavi and Omar Facchini. You can find the report of the project in the report folder or simply by clicking here.
To start working, first install virtualenv with pip.
pip install virtualenvThen create an empty virtual environment.
virtualenv .venvNote that .venv is the name of the virtual environment directory, this
directory is omitted in the .gitignore file.
After creating the virtual environment, activate it.
UNIX based Operating Systems (GNU/Linux, macOS, etc.)
source .venv/bin/activateWindows
.\venv\Scripts\activateNow you can install the required python packages in the clean environment you just created.
pip install -r requirements.txtThe data needed to run the system is already provided.
In the situation in which the users wants to generate different datasets they will have to make sure to be located in the queryrec folder in their PC and then run python .\src\datagen\datagenerator.py for windows or python ./src/datagen/datagenerator.py for linux.
These commands will generate three different sizes of datasets, currently the size is static.
the src folder contains two folders, one for the data generation and one for the actual implementation of the system called dataSetup.
In the dataSetup folder there are three .ipynb files that go step by step in the application.
The query_recommendation.ipynb notebook takes into account only the baseline as it uses only the smallest database data.
The query_recommendation_evaluation.ipynb is the main notebook to use when running the entire system as it uses all the datasets.
The general_utility.ipynb notebook is devoted to the execution of the part B of the project as it shows how our idea of utility could be implemented, since the main purpose was to show how well the approach we chose for the utility could work, the notebook uses only the baseline dataset.
The easiest way to run these notebooks would be by utilizing a jupyter notebook which allows to visualize different blocks of code and run each one separately, to properly work, the blocks have to be run in order.
To install jupyter notebook simply run pip install jupyterlab and to run it use jupyter lab command.
for more information on how to install, check out the jupyter website
Another possible way to be able to run these files is through Visual Studio Code using the Jupyter extension which can be found in the extensions side of this editor.