This project simplifies code snippets by removing variable initializations and replacing variables in print statements with their corresponding formulas. The simplified code is then executed and analyzed to identify code clones, which are later used to form pairs of clone and non-clone examples for further usage (FineTuning the Tiny-LM).
-
Code Generation:
- The project generates code snippets that contain variable initializations, logic, and print statements.
- The initial code looks like:
b = 6 g = 4 n = 3 o = (g + 1) - (n * g) if (not b < 4): print(o) else: print(b * n)
-
Simplification:
- Using the
simplify_code_levelx.pyscript (wherexrefers to the specific level), variable initializations are removed, and the print statements are modified to include the variable's formula instead of the variable itself. - The simplified version of the above code would be:
if not b < 4: print(g + 1 - n * g) else: print(b * n)
- Using the
-
Execution:
- Each simplified code is executed 150 times with random variable initializations using the
code_execution.pyfile. - The output of each execution is stored in the
outputs.jsonfile.
- Each simplified code is executed 150 times with random variable initializations using the
-
Clone Identification:
- Based on the generated outputs, the
identify_clones.pyscript is used to detect clone and non-clone code snippets. The clone pairs are stored in theclones.jsonfile.
- Based on the generated outputs, the
-
Pair Formation:
- Each code snippet is paired with other snippets to form clone and non-clone pairs. Each snippet is repeated 4 times—2 times as a clone and 2 times as a non-clone example.
simplify_code_levelx.py: Script for simplifying code at different levels by removing variable initializations.code_execution.py: Executes simplified code snippets and stores the results.identify_clones.py: Identifies clones based on execution results and stores them inclones.json.outputs.json: Stores the outputs from executing the code snippets.clones.json: Stores the identified clones for further processing.
- Generating Code:
Run the
automate.pyscript to generate code snippets.python automate.py --num_programs <number_of_programs>