Genetic String Solver

A python based evolutionary simulation which uses a Genetic Algorithm to evolve random strings into a target phrase. Following the "Infinite Monkey Theorem", but solving it in seconds instead of eons.

The simulation which checks for the sweet-spot for mutation rate is optimised using multiprocessing to get a 5-8x boost.

🚀 Features

Custom Genetic Engine: Implements standard evolutionary operators (Elitist Selection, Uniform Crossover, Bit-Flip Mutation).
High-Performance Parallelism: Uses multiprocessing.Pool to saturate CPU cores, running multiple times faster than standard sequential scripts.
Interactive Visualization: Includes a matplotlib dashboard with interactive sliders to analyze the non-linear relationship between mutation probability and convergence time.
📉 Statistical Robustness: Runs batches of N (user-defined) trials per data point to smooth out stochastic noise.

🛠️ How it works

phraseCracker.py:

The algorithm follows the canonical biological evolution cycle:

Population Initialization: Generate N (user-defined) random strings (DNA).
Fitness Calculation: Score each string based on character matches with the target.
Selection: Isolate the "fittest" strings (top K).
Reproduction: Create a new generation of "children" by: (i) Keeping the top-K fittest strings (ii) Cross-breeding the top M (user-defined) fittest parents with a P (user-defined) chance of random character mutation.
Loop: Repeat until the target string is reached.

mutation_rates_tester.py:

Focuses on benchmarking efficiency the efficiency of this evolutionary process, with respect to different mutation rates.

Variable Definition: The user defines a "Mutation Precision" (e.g., 1%). The tool then generates a range of mutation rates to test, from the precision value up to 100% (e.g., 0.01,0.02,0.03...1.0).
Task Batching: For every specific mutation rate, the tool prepares a batch of N (user-defined) independent test cases (e.g., 50 trials). This ensures that the final data is an average of many runs, reducing the impact of "lucky" or "unlucky" random seeds.
Parallel Execution (The Pool): Instead of running these trials one by one, the tool uses multiprocessing.Pool to distribute the tasks across all available CPU cores. Each core acts as an independent "Evolution Lab," running a full simulation from gibberish to the target string.
Convergence Logging: For each trial, the tool records the generations to converge (how many "years" it took to reach the target).
Data Aggregation: The results are averaged for that specific mutation rate and written to a .csv file.

🔎 Inferences

The graphs above are on the data generated with the following inputs:

Number of test cases: 100
Test case length: 10
Population size: 100
Mutation rate precision: 1

The Stagnation Zone (0-3%):

The algorithm works, but is relatively slow.
At such low mutation rates, the search radius is really small. Most of the times the child is an exact clone of the parents.
When a new mutation does occur, if the letters changed aren't the correct ones, it doesn't affect the fitness score.

Sweet Spot (Optimal Mutation 3-20%):

The convergence time drops drastically (<100 generations). The graph hits it's lowest point.
This represents the perfect balance between Exploration (trying new letters) and Exploitation (keeping the current working ones). The algorithm is aggressive enough to find new answers quickly, but stable enough so as to not break current progress.

Chaos Zone (20-100%):

The time to solve shoots up exponentially.
This is known as Error Catastrophe, the mutations are fast enough that they aren't able to preserve any good genes.
As it approached 100% mutation, the algorithm dissolves into a Random Walk (pure guessing) which is the slowest possible method.

Note:

The graph 1 appears to be Sigmoid, but that is due to an artificially induces ceiling of 10,000 generations.
If a mutation rate didn't reach solution in 10,000 generations, it was cut off.
If left to run till a solution was found the graph would resemble a "check-mark" ✔️ shape, with the no. of generations increasing exponentially with an increase in mutation rate.

How to use

Clone the repo:

git clone https://github.com/arnavlul/genetic-string-solver.git
cd genetic-string-solver

Install Dependencies: pip install numpy matplotlib pandas
Run the simulation: For 'phraseCracker.py' (individual strings):
python phraseCracker.py

For 'mutation_rates_tester.py' (benchmarker), installing pypy is highly recommended.
python mutation_rates_tester.py or pypy mutation_rates_tester.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Graphs		Graphs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mutation_rates_tester.py		mutation_rates_tester.py
phraseCracker.py		phraseCracker.py
plotter.py		plotter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genetic String Solver

🚀 Features

🛠️ How it works

phraseCracker.py:

mutation_rates_tester.py:

🔎 Inferences

How to use

About

Uh oh!

Releases

Packages

Languages

License

arnavlul/genetic-string-solver

Folders and files

Latest commit

History

Repository files navigation

Genetic String Solver

🚀 Features

🛠️ How it works

phraseCracker.py:

mutation_rates_tester.py:

🔎 Inferences

How to use

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages