This repository contains implementations of fundamental Machine Learning algorithms used for Concept Learning in hypothesis space search. The algorithms included are:
- Find-S Algorithm
- List-Then-Eliminate Algorithm
- Candidate Elimination Algorithm
These algorithms are used in machine learning to derive hypotheses from training data, particularly for binary classification problems.
Find-S (Finding the most specific hypothesis) is a simple algorithm used to find the most specific hypothesis that fits the given set of positive training examples. It starts with the most specific hypothesis and generalizes it as needed.
- Initialize the hypothesis
hto the most specific value (⟨?, ?, ?, ..., ?⟩). - For each positive training example:
- Compare it with
h. - Generalize
honly where necessary to match the example.
- Compare it with
- Ignore negative examples (Find-S only works for positive instances).
- The final hypothesis is the most specific one that fits all positive instances.
- Only works for linearly separable data.
- Cannot handle noise or negative examples.
The List-Then-Eliminate algorithm finds the correct hypothesis by maintaining a list of all possible hypotheses and eliminating those that contradict the training data.
- Start with a list of all possible hypotheses.
- For each training example:
- Remove hypotheses that do not match the example.
- At the end, the remaining hypotheses are consistent with the training data.
- Computationally expensive for large hypothesis spaces.
- Requires exhaustive enumeration of hypotheses.
The Candidate Elimination algorithm finds both the most specific (S) and most general (G) hypotheses that are consistent with the training data.
- Initialize the S set with the most specific hypothesis (
⟨∅, ∅, ..., ∅⟩). - Initialize the G set with the most general hypothesis (
⟨?, ?, ..., ?⟩). - For each training example:
- If positive:
- Generalize
Sminimally to include the example. - Remove
Gelements that do not include the example.
- Generalize
- If negative:
- Specialize
Gminimally to exclude the example. - Remove
Selements that include the example.
- Specialize
- If positive:
- Continue until no more training examples exist.
- Finds the complete hypothesis space.
- Handles noise and negative examples better than Find-S.
- Computationally expensive.
- Large hypothesis spaces can make the algorithm slow.
- Python 3.x
- Required libraries:
numpy,pandas
$ git clone https://github.com/satish-tec/machine-learning-algorithms.git
$ cd machine-learning-algorithmsTo run each algorithm:
$ python FindSAlgorithm.py # Run Find-S Algorithm
$ python list_then_eliminate.py # Run List-Then-Eliminate Algorithm
$ python candidate_elimination.py # Run Candidate Elimination AlgorithmModify the datasets and experiment with different training examples.
Data should be in CSV format with the last column as the class label (positive/negative).
Sunny, Warm, Normal, Strong, Warm, Same, Yes
Sunny, Warm, High, Strong, Warm, Same, Yes
Rainy, Cold, High, Strong, Warm, Change, No
Tom M. Mitchell, Machine Learning, McGraw Hill, 1997.
Feel free to fork and submit pull requests. Contributions are welcome!
This repository is licensed under the MIT License.