Simple python script that leverages sklearn algorithms to compare the same dataset to binary classify them.
Outputs the error for each algorithm tested
$ python3 classifier.py
training time for lda : 0.0252s
test time for lda : 0.0017s
results for lda:
error count: 4029
error rate: 46.04%
performance: 0.5287
confusion matrix:
[[3828 4502]
[ 220 201]]
training time for near : 0.0454s
test time for near : 0.3202s
results for near:
error count: 4398
error rate: 50.257%
performance: 0.5038
confusion matrix:
[[2147 2452]
[1901 2251]]
training time for bayes : 0.0069s
test time for bayes : 0.0011s
results for bayes:
error count: 4085
error rate: 46.68%
performance: 0.5242
confusion matrix:
[[3652 4270]
[ 396 433]]
If working with a UCI Dataset, fetch the dataset and set it in classify.py which should run classifiers and produce a RoC curve
$ python3 classify.py
Getting Dataset
Done.
Results for NaiveBayes
Accuracy: 59.11%
Cross Validation Score: 0.6043
Training Time: 0.0052s
Test Time: 0.002s