The code in src/share/XML-Schema-learner orignates from https://github.com/kore/XML-Schema-learner
Stable Artifact Link: https://zenodo.org/records/8423505
- Section 6.2 claims that Prolex is able to solve 80% of tasks within the two minute time limit. Additionally, Figure 14a shows the breakdown of time required for tasks across the different environments. This is shown in Step 2 of the evaluation instructions
- Section 6.2 claims that even for the most complex tasks (by AST size), 68% of them are still able to be completed. This is also further broken down in Figure 14b. This is evaluated in Step 3 of the evaluation instructions.
- Section 6.2 claims that 81% of the completed programs are solved, meaning that they semantically match the ground truth program. Since this step is done manually, it is not completed in the evaluation instructions. However, if desired, the output programs are produced in the output text files which can be evaluated manually.
- Section 6.3 claims that Prolex is able to solve: 8% more tasks than the No LLM variant, 19% more tasks than the No Prune variant, 24% more tasks than the Sketch Only variant, 8% more than the No Loop Bounds variant, and the No Sketch variant solved none of the tasks. This and the corresponding Figure 15a is shown in Step 4 of the evaluation instructions.
- Section 6.4 claims that the CVC5 baseline solves only 18% of the tasks for the Easy environment. This claim and the CVC5 line in Figure 15b is shown in Steps 5 and 6 of the evaluation instructions.
- Section 6.4 claims that a GPT 3.5 based synthesizer is able to solve 25% of representative tasks when it is given only the environment. Moreover, Figure 16a shows how well the synthesizer scales with larger environments. Figure 16b shows how well the synthesizer is able to complete and solve the representative tasks. However, this analysis required many manual steps, such as evaluating the synthesis output and comparing to ground truth programs, and thus is not evaluated in the artifact.
- Artifact with tested with Docker version 24.0.4, however we believe the artifact can run on older versions.
- Download the source code into a new directory.
- Once downloaded, navigate to the newly created directory.
- Ensure you have a docker container manager running in another process
- This can be done by executing
sudo dockerdin another window
- This can be done by executing
- Execute
./docker_runner.shto compile the docker image.- This should also run the expected conda setup once the docker image is built. It is expected that the conda environment will be
ns_policies_poplafter this step, however if not executesource setup.shin the/directory.
- This should also run the expected conda setup once the docker image is built. It is expected that the conda environment will be
- After the first build of the docker image is complete, if you wish to restart the docker image without rebuilding simply execute:
docker run -it --gpus 'all' nsp_popl_docker /bin/bash- Note: this will still require the conda environment to be setup upon entry, which should happen automatically.
- Once setup is complete, you should be in the
/srcdirectory and thens_policies_poplconda environment should be activated. - As a sanity test for the installation, run
python Synthethizers/Tests/sanity.pyfrom thesrcdirectory.- Note:
The specified target token ...output is expected.
- Note:
- Execute
source data_collection.shfrom the/srcdirectory to run Prolex, and it's ablations on all tasks and all environments. Note: This step may take up to 24 hours to complete. After completion you may want to copy the generated output files (GDFSTrue.txt,GDFSFalse.txt,True.txt,False.txt,NoSketch.txt, andNoLoopBound.txt) outside of the docker container, to ensure that this step does not have to be repeated. If one of the above text files was not generated or needs to be rerun:GDFSTrue.txtcan be rerun by runningpython Synthesizers/Tests/run_exp.pywithinf_check = [True]on line56.GDFSFalse.txtcan be rerun by runningpython Synthesizers/Tests/run_exp.pywithinf_check = [False]on line56.True.txtcan be rerun by runningpython Synthesizers/Tests/run_exp_no_llm.pywithinf_check = [True]on line56.False.txtcan be rerun by runningpython Synthesizers/Tests/run_exp_no_llm.pywithinf_check = [False]on line56.NoSketch.txtcan be rerun by runningpython Synthesizers/Tests/run_exp_no_sketches.py.NoLoopBound.txtcan be rerun by runningpython Synthesizers/Tests/run_exp_no_loop_bound.py.
- Execute
python prolex_env_plots.pyfrom the/srcdirectory to evaluate Claim 1. Given differences in machine capabilities (GPUs, RAM, and less CPU parallelization) we expect approximately as low as 73% of benchmarks to be completed in the two minute timeout. - Execute
python prolex_ast_plots.pyfrom the/srcdirectory to evaluate Claim 2. Given differences in machine capabilities (GPUs, RAM, and less CPU parallelization) we expected approximately as low as 58% of the most difficult AST benchmarks to be completed in the two minute timeout. - Execute
python prolex_variants_plots.pyfrom the/srcdirectory to evaluate Claim 4. We believe that the differences between the Prolex variants should not differ much from the stated difference in Claim 4. - Run
raco pkg install rosetteto install the necessary solver. Executepython cvc5_data_collection.pyfrom the/CVC5directory to generate the data to evaluate Claim 5. Note: This step may take up to 90 minutes. - Execute
python cvc5_plots.pyfrom the/CVC5directory to evaluate Claim 5. Given differences in machine capabilities (RAM, and less CPU parallelization) we expect approximately as low as 15% of benchmarks to be completed in the two minute timeout.
-- Docker and Conda setup scripts and files --CVC5/: Contains
*.rktfiles which contains the encoded tasks for being solved by the CVC5 solver--
cvc5_data_collection.pycontains the python calls to collect data for all CVC5 encodings. --cvc5_plots.pycontains the script to evaluate Claim 5.--src/
--
dsl.pycontains our DSL definition. --demo.pycontains the format for demonstrations. --data_collection.shcontains the python calls to collect data for all Prolex variants. --prolex_env_plots.pycontains the script to evaluate Claim 1. --prolex_ast_plots.pycontains the script to evaluate Claim 2. --prolex_variants_plots.pycontains the script to evaluate Claim 4. --data_parsers/: Contains the scripts to parse the output data generated from running the data collection step.--
Prolex_NoLLM_data_parser.pyparses data from the NoLLM variant. --Prolex_NoLoopBound_data_parser.pyparses data from the NoLoopBound variant. --Prolex_NoPrune_data_parser.pyparses data from the NoPrune variant. --Prolex_NoSketch_data_parser.pyparses data from the NoSketch variant. --Prolex_SketchOnly_data_parser.pyparses data from the SketchOnly variant. --Prolex_data_parser.pyparses data from the full Prolex variant.--share/: Contains the code for the off-the-shelf XML based Regex learner. --benchmark/: Contains environments and tasks used for evaluating Prolex.
Environments/: Contains 6 potential base environments that can be randomized with
f.py. The base environment which is most extensive, and used in the experiments is contained inE.py. Tasks/: Contains the set of tasks that are executed by Prolex for the experiments.-- Env/: Contains environment class definitions as well as classes for locations, relations, objects, and the robot -- Synthesizers/: Contains all aspects of the Prolex synthesizer
-- Tests/: Contains the tests used to run the Prolex experiments and ablations.
run_exp.pyruns Prolex, as well as the NoPrune variant.run_exp_no_llm.pyruns the Prolex NoLLM and SketchOnly variants.run_exp_no_sketches.pyRuns the NoSketch variant of Prolex.run_exp_no_loop_bound.pyRuns the NoLoopBound variant of Prolex.sanity.pyRuns the build sanity check.Sketch Completion --
parallel_enumeration.pyis the top level code for executing Prolex sketch completion. --worklist_sketch_completion.pyis the sketch completion algorithm of Prolex. --find_hole.pyis used to find the next hole given a sketch. --ordered_hole_fills_w_prob.pyis used to order potential hole fills given a sketch and the hole to be filled. --fill_holes.pyreturns a new sketch with a selected hole filled, with a selected hole fill. --infeasibility_check.pyis used to evaluate whether or not the current sketch is feasible. --check.pyis used to evaluate whether a completed program matches the given demonstration. --lm_w_prob.pyis used to get the result from the LLM. --get_hole_sentence.pyis used bylm_w_prob.pyto get the context for the LLM. Sketch Generation --end_to_end_sketch_set_gen.pyis the top level sketch generation from demonstration algorithm. --demo_to_xml.pyconverts the given demonstration into a format compatible with the XML based Regex learner. --xml_to_regex_tree.pyapplies the Regex learner and uses it's output to create an AST of the produced regex. --apply_rewrite_rules.pyis used to create additional regex AST using our defined rewrite rules. --translate_regex_tree_to_sketch.pyis used to create a sketch from the regex AST.
To add additional tasks to test Prolex:
- Create an additional task file in the
src/benchmark/Tasks/directory following the format of one of the existing benchmark task files, such asB6.py. - On line
65ofsrc/Synthesizers/Tests/run_exp.pymodifyprog_listto contain your new task. If you want to only run this new task (recommended for time) setprog_listto only contain your new task e.g.prog_list = [Your_Task_Name()] - If you want to only run a specific difficulty, modify the
diff_liston line54ofsrc/Synthesizers/Tests/run_exp.pyto contain one or more of 'e' (easy), 'm' (medium), or 'h' (hard). - To run different variants of Prolex follow the steps below:
- NoLLM: run
python Synthesizers/Tests/run_exp_no_llm.pywith the modifiedprog_listandinf_check = [True] - NoPrune: run
python Synthesizers/Tests/run_exp.pywith the modifiedprog_listandinf_check = [False] - SketchOnly: run
python Synthesizers/Tests/run_exp_no_llm.pywith the modifiedprog_listandinf_check = [False] - NoSketch: run
python Synthesizers/Tests/run_exp_no_sketches.pywith the modifiedprog_listandinf_check = [True] - NoLoopBound: run
python Synthesizers/Tests/run_exp_no_loop_bound.pywith the modifiedprog_listandinf_check = [True]
- NoLLM: run