Rock-classification

In this problem, you will apply different classification methods. You will use a Rock dataset where you will use 19 different rock features to predict the rock category. The data you need are included in these two files: 1) aggregateRockData.xlsx Download aggregateRockData.xlsxyou will only use 2nd column that contains the rock category number (1 = Igneous, 2 = Metamorphic, 3 = Sedimentary) - that will be the label. 2) norm540.txt Download norm540.txtyou will only use columns 4 to 22 - those will be the attributes (features). See this website for a detailed description of the dataset: https://osf.io/cvwu9/wiki/Data%20File%20Descriptions/.

Display the statistical values for each of the attributes, along with visualizations (e.g., histogram) of the distributions for each attribute. Are there any attributes that might require special treatment? If so, what special treatment might they require?
Analyze and discuss the relationships between the data attributes, and between the data attributes and label. This involves computing the Pearson Correlation Coefficient (PCC) and generating scatter plots.
Select 20% of the data for testing and 20% for validation and use the remaining 60% of the data for training. Describe how you did that and verify that your test and validation portions of the data are representative of the entire dataset.
Train different classifiers and tweak the hyperparameters to improve performance (you can use the grid search if you want or manually try different values). Report training, validation and testing performance (classification accuracy, precision, recall and F1 score) and discuss the impact of the hyperparameters (use markdown cells in Jupyter Notebook to clearly indicate each solution): Multinomial Logistic Regression (softmax regression); hyperparameters to explore: C, solver, max number of iterations.
Support vector machines (make sure to try using kernels); hyperparameters to explore: C, kernel, degree of polynomial kernel, gamma.
Random Forest classifier (also analyze feature importance); hyperparameters to explore: the number of trees, max depth, the minimum number of samples required to split an internal node, the minimum number of samples required to be at a leaf node.
Combine your classifiers into an ensemble and try to outperform each individual classifier on the validation set (try to get above 80% accuracy). Once you have found a good one, try it on the test set. Describe and discuss your findings.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
HomeWork_2_Q_1-Final_Attemp2-2.ipynb		HomeWork_2_Q_1-Final_Attemp2-2.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rock-classification

About

Uh oh!

Releases

Packages

Languages

MXNXV/Rock-classification

Folders and files

Latest commit

History

Repository files navigation

Rock-classification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages