GitHub - fmalato/DecisionTreeAI: Progetto AI

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.idea		.idea
CrossValidation.py		CrossValidation.py
Decision trees.pdf		Decision trees.pdf
DecisionTree.py		DecisionTree.py
Heuristics.py		Heuristics.py
README		README
Utils.py		Utils.py
bankruptcy.csv		bankruptcy.csv
bloodDonation.csv		bloodDonation.csv
demo.py		demo.py
nurseryClassifier.csv		nurseryClassifier.csv
playtennis.csv		playtennis.csv

Repository files navigation

##############################
#### PROGRAM UTILIZATION: ####
##############################

1. Set the program:
For a proper use, you need to either set the local path to the chosen data set or place the data set in
the project folder.
The program provides the first function at various line in the "demo.py" file: just modify the csv.reader("")
argument. Set it correctly, or it won't work.

As a second rule, you must either place the target attribute as the rightmost (last position) attribute in your
data set or change the targetAttr assignment, looking for the right values in your data set. Anyway, I highly
recommend not to do this.

Remember also to set the proper delimiter for the data set (same lines as the path setting). Else, the program
won't crash, but it won't even work properly.

2. How to run the program:
To run the program, just execute the demo.py file, making sure you have made a good setting following the previous
point.

3. How to read the output:
On the first part, the program will print some values and analytics like that:

--------------------------------------------------
----------- Dataset : Blood Donation -------------
--------------------------------------------------

####################
### iterazione 5 ###
####################

Criterio: gini
Round score: 0.785234899329
Criterio: entropy
Round score: 0.791946308725
Criterio: missclass
Round score: 0.778523489933

Media giniScores: 0.742281879195 Media entrScores: 0.747651006711 Media misClassScores: 0.740939597315

The first two lines just give you informations about the current data set and iteration. Then, you
are given an information about the current criterion and a sequence of the scores for the current iteration
is reported as a decimal number. The score is given as a decimal number, but it has to be intended as a
precentual number.
On the second part, the program will print a bunch of values, arrows and names, like this one:

density
---> 0.99754
alcohol
---> 9.4
-----> 5
---> 10
-----> 6
---> 8.8
-----> 5

this output has the following rules:
- Each name stands for a node name, which stands for a specific attribute of the data set;
- Each arrows stands for a branch, i.e. a different unique value of the attribute upon the arrow;
- Each number stands for a unique value of the splitter attribute, which is the weight of that specific branch;
- If there's another arrows under a number, without any label, it means that there's no need to split on another
attribute. In that case, every branch leads to a leaf;
- If there are no more arrows nor labels after a number, it means that such a number is a leaf of the tree.

4. Special thanks:
This "special thanks" section is intended as a needed citation to the websites/books that have helped me during
the coding phase.

- Christopher Roach and his article "Building Decision Trees in Python"
(http://archive.oreilly.com/pub/a/python/2006/02/09/ai_decision_trees.html?page=1)
Thanks for giving me tips and tricks, clear explanations (and also ideas) for some functions.
- The StackOverflow Community
(https://stackoverflow.com/)
Thanks for providing me the tons of answered question about the topic I have been studying.
- S.Russell, P.Norvig "Artificial Intelligence: a Modern Approach"
Thanks for the exhaustive theorical explanation of every single aspect of this program and for the
train_test_split() function on the aima-python repository (and also for the puns!)