EPFL CS-433: PROJECT 1 "Higgs Boson Challenge"

The Higgs boson is an elementary particle which explains why other particles have a mass. Measurements during high-speed collisions of protons at CERN were made public with the aim of predicting whether the collision by-products are an actual boson or background noise.

The work was mainly done in 2 ways: data pre-processing and then applying logistic regression.

Preprocessing can include different combinations of the following methods: (1) replacing undefined datapoints by the median/mean, (2) performing a polynomial expansion, (3) standardizing.

Logistic Regressions are subsequently implemented and legitimized by means of a 7-fold cross validation.

The entire project only uses python libraries Numpy and Matplolib (for visualisation).

Please add the files train.csv and test.csv directly in the repository.

Code description

`run.py`

This file produces the predictions same file used to obtain the team's ("no_CS_members") the best score on the aicrowd platform. It is self-contained and only requires access to the data and files described below.

`implementations.py`

This file contains the required functions as stated in the project outline pdf file.

mean_squared_error_gd, mean_squared_error_sgd, least_squares, ridge_regression
logistic_regression, reg_logistic_regression

As well as auxiliary functions supporting the ones cited above.

compute_mse_loss, compute_mse_gradient, batch_iter, compute_stoch_mse_gradient, sigmoid, calculate_logistic_loss, calculate_logistic_gradient
calculate_stoch_logistic_gradient, stoch_reg_logistic_regression

`data_processing.py`

This file contains functions used to pre-process data.

data_removed, data_replaced, split_data, add_w0
normalize_log_gaussian, normalize_angles, normalize_gaussian, normalize_min_max, normalize

`hyperparams.py`

This file contains functions to optimize the hyperparameter lambda.

build_k_indices, cross_validation, cross_validation_demo

And to calculate the best degree for the polynomial expansion of each feature, and build the corresponding polynom.

build_poly, best_degree_selection, phi_optimized

`classification.py`

This file contains functions used to classify the data, aswell as some for computation of evaluating metrics.

simple_class, get_accuracy, get_only_accuracy, get_auc, roc_visualization
get_Kneigbors, getKpredictions

`our_progress_run.ipynb`

A notebook outlining the step-by-step progress of the model (each stage adds something on top of the previous version):

logistic regression
logistic regression + normalized
logistic regression + normalized + w0
logistic regression + normalized smart + w0
logistic regression + normalized smart + w0 + high correlation features removed

`our_progress_loop.py`

This file allowed to run mutiple repetitions of each method described in "our progress", in order to compare their mean and standard deviation.

`seven_methods.py`

This file allows to calculate the accuracy for seven methods of regression and classification coded for this project.

A. Gradient Descent with MSE
B. Stochastic Gradient Descent with MSE
C. Least Squares
D. Ridge Regression with cross validation to find best lambda
E. Logistic Regression with cross validation to find best lambda
F. Regularized Logistic Regression
G. K-nearest neighbors classification

`boxplotloop.py`

This file allows to calculate the accuracy for each method on random train sets, in order to build their box plot.

`helpers.py`

Helper functions used to load the data and create the csv submission

Authors

Mathilde Morelli
Iris Toye
Alexei Ermochkine

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
README.md		README.md
boxplotloop.py		boxplotloop.py
classification.py		classification.py
cross_validation.png		cross_validation.png
data_processing.py		data_processing.py
helpers.py		helpers.py
hyperparams.py		hyperparams.py
implementations.py		implementations.py
our_progress_run.ipynb		our_progress_run.ipynb
our_progress_run_loop.py		our_progress_run_loop.py
run.py		run.py
run_for_test.ipynb		run_for_test.ipynb
seven_methods.ipynb		seven_methods.ipynb
useful_commands_git.txt		useful_commands_git.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EPFL CS-433: PROJECT 1 "Higgs Boson Challenge"

Code description

`run.py`

`implementations.py`

`data_processing.py`

`hyperparams.py`

`classification.py`

`our_progress_run.ipynb`

`our_progress_loop.py`

`seven_methods.py`

`boxplotloop.py`

`helpers.py`

Authors

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

alexei-erm/ML_project1

Folders and files

Latest commit

History

Repository files navigation

EPFL CS-433: PROJECT 1 "Higgs Boson Challenge"

Code description

run.py

implementations.py

data_processing.py

hyperparams.py

classification.py

our_progress_run.ipynb

our_progress_loop.py

seven_methods.py

boxplotloop.py

helpers.py

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

`run.py`

`implementations.py`

`data_processing.py`

`hyperparams.py`

`classification.py`

`our_progress_run.ipynb`

`our_progress_loop.py`

`seven_methods.py`

`boxplotloop.py`

`helpers.py`

Packages