This is our proof of concept for detecting possible hallucinations in LLM generated code.
LLMPackageGuard/
├── src/
│ ├── scanner/ # Main scanner implementation
│ │ └── scanner.py # Core scanning logic for package analysis
│ └── api/ # API components
├── extension/ # Chrome extension for web integration
│ └── assets/
│ ├── css/ # Extension stylesheets
│ └── icons/ # Extension icons
├── data/
│ ├── corpus/ # Test corpus of LLM-generated code
│ └── txts/ # .txt of top packages
└── docs/
└── images/ # Documentation images
-
Scanner (
src/scanner/): Core Python scanning engine that analyzes LLM-generated code for potentially malicious or outdated packages -
Chrome Extension (
extension/): Browser extension that integrates with web-based LLM interfaces to scan generated code in real-time -
Data (
data/): Contains test corpus (randomly generated code by LLMs and .txt files) used for analysis
-
PyPI API: With the official PyPI API, package information is queried for a specific version to get metadata about the repository URL, date of creation, source code, etc, which serves as markers for recently created malicious repositories.
-
OSV API: The Open Source Vulnerabilities is a vulnerability database, which contains CVE information related to a package. This is used to return the number and type of vulnerabilities that exist with the package in your LLM generated code.
-
GitHub API: The GitHub API is used to fetch repository details of the package that is hosted on PyPI. Information like repository creation date, stars, issues, etc serve as important markers. Generally, supply chain attacks which target malicious repositories are usually recently created with few stars and community engagement.
-
Typosquatting detection: Typosquatting checks done in case LLM generated package names that are similar to existing package names. It is possible that, malicious actors might notice LLMs generating incorrect package names which most people might overlook. This is done by computing Levenshtein distance between LLM generated package name and the top 10000 PyPI packages.
Run the dependency scanner to analyze Python files for potentially malicious or outdated packages:
# Navigate to the scanner directory
cd src/scanner
# Scan a specific directory
python scanner.py /path/to/your/python/project
# Scan the default directory - code/
python scanner.pyThe scanner will:
- Analyze all Python files in the specified directory
- Check
requirements.txtfiles if present - Generate a detailed report of findings
- Load the extension in Chrome from the
extension/directory - Run the
server.pyfile which sets up the server for scanning LLM generated code on LLM interfaces
# Navigate to src directory
cd src
# Run the server script
python3 -m api.server- Navigate to any LLM interface (ChatGPT, Claude, etc.)
- The extension will automatically scan generated Python code for suspicious packages
Scanning the requirements.txt file.
Extension scanning the results of ChatGPT generated code.
- Coming up with other markers that could give more insights into if a package is malicious or not.
- Currently only Python is supported. Must extend this support to other languages.

