Skip to content

A high-performance Python implementation of Dawkin's Weasel experiment. Uses multiprocessing and matplotlib to visualise how different mutation rates affect evolutionary convergence speed.

License

Notifications You must be signed in to change notification settings

arnavlul/genetic-string-solver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genetic String Solver

A python based evolutionary simulation which uses a Genetic Algorithm to evolve random strings into a target phrase. Following the "Infinite Monkey Theorem", but solving it in seconds instead of eons.

The simulation which checks for the sweet-spot for mutation rate is optimised using multiprocessing to get a 5-8x boost.

🚀 Features

  • Custom Genetic Engine: Implements standard evolutionary operators (Elitist Selection, Uniform Crossover, Bit-Flip Mutation).
  • High-Performance Parallelism: Uses multiprocessing.Pool to saturate CPU cores, running multiple times faster than standard sequential scripts.
  • Interactive Visualization: Includes a matplotlib dashboard with interactive sliders to analyze the non-linear relationship between mutation probability and convergence time.
  • 📉 Statistical Robustness: Runs batches of N (user-defined) trials per data point to smooth out stochastic noise.

🛠️ How it works

phraseCracker.py:

The algorithm follows the canonical biological evolution cycle:

  1. Population Initialization: Generate N (user-defined) random strings (DNA).
  2. Fitness Calculation: Score each string based on character matches with the target.
  3. Selection: Isolate the "fittest" strings (top K).
  4. Reproduction: Create a new generation of "children" by: (i) Keeping the top-K fittest strings (ii) Cross-breeding the top M (user-defined) fittest parents with a P (user-defined) chance of random character mutation.
  5. Loop: Repeat until the target string is reached.

mutation_rates_tester.py:

Focuses on benchmarking efficiency the efficiency of this evolutionary process, with respect to different mutation rates.

  1. Variable Definition: The user defines a "Mutation Precision" (e.g., 1%). The tool then generates a range of mutation rates to test, from the precision value up to 100% (e.g., 0.01,0.02,0.03...1.0).
  2. Task Batching: For every specific mutation rate, the tool prepares a batch of N (user-defined) independent test cases (e.g., 50 trials). This ensures that the final data is an average of many runs, reducing the impact of "lucky" or "unlucky" random seeds.
  3. Parallel Execution (The Pool): Instead of running these trials one by one, the tool uses multiprocessing.Pool to distribute the tasks across all available CPU cores. Each core acts as an independent "Evolution Lab," running a full simulation from gibberish to the target string.
  4. Convergence Logging: For each trial, the tool records the generations to converge (how many "years" it took to reach the target).
  5. Data Aggregation: The results are averaged for that specific mutation rate and written to a .csv file.

🔎 Inferences

Evolution Time vs Mutation Rate Graph

Focusing on U-curve

The graphs above are on the data generated with the following inputs:

  • Number of test cases: 100
  • Test case length: 10
  • Population size: 100
  • Mutation rate precision: 1
  1. The Stagnation Zone (0-3%):
  • The algorithm works, but is relatively slow.
  • At such low mutation rates, the search radius is really small. Most of the times the child is an exact clone of the parents.
  • When a new mutation does occur, if the letters changed aren't the correct ones, it doesn't affect the fitness score.
  1. Sweet Spot (Optimal Mutation 3-20%):
  • The convergence time drops drastically (<100 generations). The graph hits it's lowest point.
  • This represents the perfect balance between Exploration (trying new letters) and Exploitation (keeping the current working ones). The algorithm is aggressive enough to find new answers quickly, but stable enough so as to not break current progress.
  1. Chaos Zone (20-100%):
  • The time to solve shoots up exponentially.
  • This is known as Error Catastrophe, the mutations are fast enough that they aren't able to preserve any good genes.
  • As it approached 100% mutation, the algorithm dissolves into a Random Walk (pure guessing) which is the slowest possible method.

Note:

  • The graph 1 appears to be Sigmoid, but that is due to an artificially induces ceiling of 10,000 generations.
  • If a mutation rate didn't reach solution in 10,000 generations, it was cut off.
  • If left to run till a solution was found the graph would resemble a "check-mark" ✔️ shape, with the no. of generations increasing exponentially with an increase in mutation rate.

How to use

  1. Clone the repo:
git clone https://github.com/arnavlul/genetic-string-solver.git
cd genetic-string-solver
  1. Install Dependencies: pip install numpy matplotlib pandas

  2. Run the simulation: For 'phraseCracker.py' (individual strings):
    python phraseCracker.py

  • For 'mutation_rates_tester.py' (benchmarker), installing pypy is highly recommended.
    python mutation_rates_tester.py or pypy mutation_rates_tester.py

About

A high-performance Python implementation of Dawkin's Weasel experiment. Uses multiprocessing and matplotlib to visualise how different mutation rates affect evolutionary convergence speed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages