This repsitory contains the code for our paper Eywa: Automating Model-based Testing using LLMs. Our framework uses LLMs to automatically construct modular protocol models from natural-language specifications and applies symbolic execution and differential testing to generate high-coverage tests with minimal user effort.
Please ensure that Docker is already installed on your system and accessible to non-root users. Pull the Klee Docker image.
$ docker pull klee/klee:3.0Next, clone the repository onto your local machine.
$ git clone https://github.com/microsoft/Model_Based_Testing_Using_LLMs.git
$ cd Model_Based_Testing_Using_LLMsWe recommend setting up a virtual environment in Python to avoid conficts with pre-installed libraries.
$ python3 -m venv eywa_env
$ source eywa_env/bin/activateAlternatively, you could use a conda virtual environment.
$ conda create --name eywa_env python=3.10
$ conda activate eywa_envInstall the required libraries. The following command ensures that eywa is installed into your virtual environment as an importable library.
$ pip3 install -e .Now, you need to add your OpenAI API key to the scripts folder.
$ cd scripts
$ touch openai_key.txt
$ echo "sk..." > openai_key.txtTo generate test inputs for differential testing, you must be in the scripts directory. For DNS, the following options are available:
$ python3 dns.py -h
usage: dns.py [-h] -m {cname,dname,wildcard,ipv4,full_lookup,loop_count,rcode,authoritative} [-n] [-r RUNS]
options:
-h, --help show this help message and exit
-m {cname,dname,wildcard,ipv4,full_lookup,loop_count,rcode,authoritative}, --module {cname,dname,wildcard,ipv4,full_lookup,loop_count,rcode,authoritative}
The DNS module to generate inputs for.
-t, --test Generate inputs for differential testing.
-r RUNS, --runs RUNS Number of runs to generate inputs for.For example, if you want to generate test inputs for CNAME with 10 LLM-written models, you must run the following command:
$ python3 dns.py -n -m cname -r 10Note that for the specific purpose of differential testing, the -n flag must be enabled at all times.
For BGP test generation, we have the following options:
$ python3 bgp.py -h
usage: bgp.py [-h] -m {confed,rr,rmap_pl,rr_rmap} [-n] [-r RUNS]
options:
-h, --help show this help message and exit
-m {confed,rr,rmap_pl,rr_rmap}, --module {confed,rr,rmap_pl,rr_rmap}
The BGP module to generate inputs for.
-r RUNS, --runs RUNS Number of runs to generate inputs for.So for instance, if you want to generate test inputs for testing BGP confederations using 10 LLM-generated models, you should be using the following command:
$ python3 bgp.py -m confed -r 10For SMTP, we have only one option (you can still select the number of runs):
$ python3 smtp -m server -r 10All the generated test cases are stored in .../tests/{dns|bgp|smtp}/NSDI/{model} folder as appropriate.
Navigate to the tester directory.
$ cd ../testerTo run differential testing with the generated test inputs in the previous step, first navigate to the DNS directory:
$ cd dnsBuild the required DNS implementation images by following this README. For differential testing, we have the following options:
$ python3 -m Scripts.test_with_valid_zone_files -h
usage: python3 -m Scripts.test_with_valid_zone_files [-h] [--path DIRECTORY_PATH]
[--id {1,2,3,4,5}] [-r START END] [-b]
[-n] [-k] [-p] [-c] [-y] [-m] [-t] [-e] [-l]
Runs tests with valid zone files on different implementations.
Either compares responses from mulitple implementations with each other or uses a
expected response to flag differences (only when one implementation is passed for testing).
optional arguments:
-h, --help show this help message and exit
--path DIRECTORY_PATH The path to the directory containing ZoneFiles and either Queries or
ExpectedResponses directories.
(default: Results/ValidZoneFileTests/)
--id {1,2,3,4,5} Unique id for all the containers (useful when running comparison in
parallel). (default: 1)
-r START END The range of tests to compare. (default: All tests)
-b Disable Bind. (default: False)
-n Disable Nsd. (default: False)
-k Disable Knot. (default: False)
-p Disable PowerDns. (default: False)
-c Disable CoreDns. (default: False)
-y Disable Yadifa. (default: False)
-m Disable MaraDns. (default: False)
-t Disable TrustDns. (default: False)
-e Disable Technitium. (default: False)
-l, --latest Test using latest image tag. (default: False)Results will be stored in ../../tests/dns/NSDI/{model}/Differences.
For running differential testing with BGP test inputs, first navigate to the bgp directory.
$ cd bgpBuild the Docker images for BGP implementations following this README.
Depending on which feature you want to test, you must cd to the corresponding directory. For example, if you want to test BGP confederations:
$ cd confedNow, run the following command:
$ python3 diff_testing.pyResults will be saved in test directory i.e. ../../tests/bgp/NSDI/{model}
For running differential testing with SMTP test inputs, first navigate to the smtp folder.
$ cd smtpDownload the required Python libraries.
$ sudo apt-get install opensmtpd
$ sudo pip3 install aiosmtpdRun the following command:
$ sudo python3 diff_testing.pyResults will be saved in test directory i.e. ../../tests/smtp/NSDI/SMTP
To reproduce similar graphs on the number of runs versus the number of unique tests, as provided in the appendix of the paper, navigate to the scripts directory and run the following commands:
$ python3 dns.py -m cname -r 12
$ python3 plot_graphs.py --model cname --runs 12The available options for models are cname, dname, ipv4 and wildcard.