Hybrid Architecture for Intelligent Code Error Feedback
A CodeT5-based deep learning assistant that pinpoints Python code errors and delivers mentor-style hints via the OpenAI GPT APIโso new coders can learn by solving, not by copying.
New Python learners often get stalled by cryptic red-lined errors. CodeKraft fine-tunes Salesforceโs CodeT5 on a curated subset of the TSSB-3M-ext dataset, focusing exclusively on easy and medium difficulty bugs. Once CodeT5 predicts a suggested fix, the pipeline calls OpenAIโs GPT API to craft a hint that nudges the student toward understanding the root problem without revealing the full answer.
- Error Localization: Learns common syntactic and logical bug patterns (SStuB) to isolate the problematic snippet.
- Difficulty-Focused Training: Filters for low/medium difficulty examples, boosting reliability on beginner-level mistakes.
- Hybrid Architecture: Combines a specialized code-generation model (CodeT5) with a conversational LLM mentor (OpenAI GPT).
- Efficient Inference: Applies input truncation and batching for sub-second responses on GPU (or CPU fallback).
- Modular Pipeline: Clear separation of data preprocessing, model fine-tuning, and inference for easy customization.
(Exact-match accuracy is intentionally omitted; see below for more informative metrics.)
- CodeBLEU: 52.3% on the easy+medium subset (captures syntax & structure similarity)
- Token-level F1: 75.4% (measures partial token overlap for nuanced fixes)
- Inference Latency: ~0.25s per snippet on a Tesla T4 GPU
These metrics better reflect real-world hint usefulness and partial correctness than strict exact matches.
git clone https://github.com/kaps117/CodeKraft.git
cd CodeKraft
pip install -r requirements.txt(Optional) Authenticate with Hugging Face:
export HF_TOKEN="<your_token_here>"-
Fine-tune your model:
python finetune.py \ --dataset zirui3/TSSB-3M-ext \ --filter low medium \ --model Salesforce/codet5-base \ --output_dir codet5_easy_medium
-
Run the API server:
uvicorn serve:app --reload
-
Get a hint:
curl -X POST http://localhost:8000/fix \ -H "Content-Type: application/json" \ -d '{"pattern": "MissingColon", "buggy": "for i in range(5) print(i)"}'
Response:
{"hint": "It looks like your loop header is missing a colon at the end. Try adding `:` after `range(5)` to define the loop block."}- Expand to high-difficulty bug patterns via curriculum learning
- Dashboard for educator analytics on common errors
- Multi-language support beyond Python