Forth work of UB course "Introduction to Machine Learning" implementing dimensionality reductions and visualization techniques
Eva Veli, Andras Kasa and Niklas Long Schiefelbein
- PyCharm IDE (Professional or Community Edition)
- Python 3.9 installed on your system
-
Open the project
IML4in PyCharm -
Open the terminal in PyCharm (View > Tool Windows > Terminal)
-
Optional: Verify current location being
IML4bypwd -
Optional: Navigate to
IML4withcd -
Create a virtual environment:
# Windows py -3.9 -m venv venv # macOS/Linux python3.9 -m venv venv
-
Activate the virtual environment:
# Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate
In front of the input line in the terminal it should now say (venv)
With the virtual environment activated:
pip install -r requirements.txtFrom here you can directly jump to Run app.py
With the virtual environment activated:
deactivateThe (venv) in front of the terminal should be gone
For this, just follow the optional steps 3 and 4 from the Manual Virtual Environment Setup
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activateIn front of the input line in the terminal it should now say (venv)
python app.pyThe first execution takes more time than usual due to the initial compilation of the whole project. Once compiled, it prompts the user to provide an input. The user must decide whether to use the cmc or the pen-based dataset for the analysis. By simply pressing enter, the cmc dataset will be selected by default.
Now the preprocessing pipeline will execute. After that the user can choose from 11 menu items including executing the clustering algorithm, report generation and exiting the program. Menu point 1 includes all PCA-Analysis items. Progress for each functionality is displayed in the console, but due to the fast computation it may be difficult to follow at all times. It is recommended to refer to the final reports for evaluation.
For deeper insights please consider reading the report of the project.
IML4/
├── cluster_algorithms/ # K-Means++ and OPTICS
├── datasets/ # Dataset files
├── metrics/ # Distance and evaluation metrics
├── pca/ # PCA code and projection data
├── plots/ # All plots
├── plotting/ # Code used to create plots
├── preprocessing/ # Preprocessing functions
├── results/ # CSV results for each algorithm
├── summary/ # LaTeX tables and their code
├── .gitignore # Gitignore file
├── app.py # Main application script
├── README.md # This file
├── requirements.txt # Dependencies
└── utils.py # Utility functions