Comparison of performance-focused pull requests opened by AI coding agents and human developers. This repo contains the curated datasets plus analysis pipelines for review/merge dynamics, optimization-pattern labeling, patch complexity deltas, and validation evidence.
.
├── datasets/
│ ├── ai_pr/ # AI PR parquet bundles + notebook
│ ├── human_pr/ # Human PR parquet bundles + notebook
│ ├── pr_filtering/ # Valid perf PR ID filtering notebook
│ └── performance_prs_*.csv # Unified AI vs human datasets
├── Quantative_analysis/
│ ├── review_time_and_merge_rate/ # Review/merge latency analysis
│ └── patch_size_and_complexity_analysis/ # Lizard-based deltas
├── RQ1_pattern_analysis/ # Optimization-pattern analysis
├── RQ2_test_and_validation/ # Testing/validation evidence
└── requirements.txt
Key folders:
- datasets/ holds the raw parquet pulls and the curated joins (
performance_prs_ai_vs_human.csv,performance_prs_ai_vs_human_raw.csv) generated by the notebooks indatasets/ai_pr/anddatasets/human_pr/. - Quantative_analysis/review_time_and_merge_rate/ contains
rq1_analysis.ipynband figures likereview_time_distribution.png. - Quantative_analysis/patch_size_and_complexity_analysis/ includes CLI scripts (
ai.py,human.py) that fetch patches and runlizard, plusanalyze_result.ipynbfor plotting. - RQ1_pattern_analysis/ stores the optimization catalog (
catalog/), LLM labeling notebooks/scripts, mismatch comparisons (compare_pattern.py), and results/plots. - RQ2_test_and_validation/ hosts GPT/Gemini labeling notebooks, manual label merges, final parquet outputs, and validation plots.
- Hugging Face
hao-li/AIDevdataset – The notebooks read the following parquet tables via thehf://URI scheme:pull_request.parquet,human_pull_request.parquetpr_task_type.parquet,human_pr_task_type.parquetpr_commit_details.parquetall_repository.parquet
- Local caches – Frequently accessed aggregates are stored under
datasets/and reused by downstream notebooks. - LLM outputs – Intermediate responses are saved under
RQ1_pattern_analysis/llm_data/andRQ2_test_and_validation/llm_data/.
Authenticate with Hugging Face (huggingface-cli login) before running notebooks so hf:// reads succeed. Large parquet files are intentionally kept out of Git history.
- Use Python 3.10+ and install dependencies:
python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt pip install lizard PyGithub huggingface_hub # used by patch-size scripts
- Create a
.envin the repo root for GitHub API access (used byQuantitative_analysis/patch_size_and_complexity_analysis/*.py):GITHUB_TOKEN=ghp_your_personal_access_token - LLM-specific tooling:
RQ1_pattern_analysis/optimization_pattern_detection_qwen.pycalls a local Ollama endpoint; configureOLLAMA_HOSTor update the client block if needed.- GPT/Gemini notebooks expect API keys via environment variables (e.g.,
OPENAI_API_KEY,GOOGLE_API_KEY). Store them in.envand load withpython-dotenvinside the notebooks.
- Prepare datasets (once per machine)
- Run
datasets/pr_filtering/get_valid_pr.ipynbto regeneratevalid_perf_pr_ids.csvif you adjust filtering rules. - Execute
datasets/ai_pr/ai_pr.ipynbanddatasets/human_pr/human_pr.ipynbto download/cross-check PR metadata, commits, comments, reviews, and workflow runs. - The notebooks refresh
datasets/performance_prs_ai_vs_human_raw.csvanddatasets/performance_prs_ai_vs_human.csv.
- Run
- Quantitative - Review time & merge rate
- Open
Quantitative_analysis/review_time_and_merge_rate/rq1_analysis.ipynb. - Point the intake cell to
datasets/performance_prs_ai_vs_human.csv. - Running the notebook recreates the summary tables and plots in the same folder.
- Open
- Quantitative – Patch size & complexity impact
- Ensure
GITHUB_TOKENis set, then run:python Quantitative_analysis/patch_size_and_complexity_analysis/ai.py python Quantitative_analysis/patch_size_and_complexity_analysis/human.py
- Use
Quantitative_analysis/patch_size_and_complexity_analysis/analyze_result.ipynbto regenerate plots inresults/.
- Ensure
- RQ1 – Optimization pattern analysis
- Choose a labeling notebook (
RQ1_pattern_analysis/optimization_pattern_detection_gpt.ipynborRQ1_pattern_analysis/optimization_pattern_detection_gemini.ipynb) to classify each performance PR into the catalog inRQ1_pattern_analysis/catalog/. - For local models, run:
python RQ1_pattern_analysis/optimization_pattern_detection_qwen.py
- Compare agreement via:
python RQ1_pattern_analysis/compare_pattern.py
- Plots and summary tables land in
RQ1_pattern_analysis/results/.
- Choose a labeling notebook (
- RQ2 – Testing & validation evidence
- Start with
RQ2_test_and_validation/validation-gpt.ipynborRQ2_test_and_validation/validation-gemini2.ipynb. RQ2_test_and_validation/merge_validation_labels.ipynbconsolidates manual labels underRQ2_test_and_validation/manual_label/.RQ2_test_and_validation/analysis.ipynbandRQ2_test_and_validation/validation_plots.ipynbgenerate the final parquet and figures inresults/.
- Start with