LLMPackageGuard

This is our proof of concept for detecting possible hallucinations in LLM generated code.

Project Structure

LLMPackageGuard/
├── src/
│   ├── scanner/          # Main scanner implementation
│   │   └── scanner.py    # Core scanning logic for package analysis
│   └── api/              # API components
├── extension/            # Chrome extension for web integration
│   └── assets/
│       ├── css/         # Extension stylesheets
│       └── icons/       # Extension icons
├── data/ 
│   ├── corpus/          # Test corpus of LLM-generated code
│   └── txts/            # .txt of top packages
└── docs/
    └── images/          # Documentation images

Components

Scanner (src/scanner/): Core Python scanning engine that analyzes LLM-generated code for potentially malicious or outdated packages
Chrome Extension (extension/): Browser extension that integrates with web-based LLM interfaces to scan generated code in real-time
Data (data/): Contains test corpus (randomly generated code by LLMs and .txt files) used for analysis

Integrations into the scanner

PyPI API: With the official PyPI API, package information is queried for a specific version to get metadata about the repository URL, date of creation, source code, etc, which serves as markers for recently created malicious repositories.
OSV API: The Open Source Vulnerabilities is a vulnerability database, which contains CVE information related to a package. This is used to return the number and type of vulnerabilities that exist with the package in your LLM generated code.
GitHub API: The GitHub API is used to fetch repository details of the package that is hosted on PyPI. Information like repository creation date, stars, issues, etc serve as important markers. Generally, supply chain attacks which target malicious repositories are usually recently created with few stars and community engagement.
Typosquatting detection: Typosquatting checks done in case LLM generated package names that are similar to existing package names. It is possible that, malicious actors might notice LLMs generating incorrect package names which most people might overlook. This is done by computing Levenshtein distance between LLM generated package name and the top 10000 PyPI packages.

Usage

Scanner

Run the dependency scanner to analyze Python files for potentially malicious or outdated packages:

# Navigate to the scanner directory
cd src/scanner

# Scan a specific directory
python scanner.py /path/to/your/python/project

# Scan the default directory - code/
python scanner.py

The scanner will:

Analyze all Python files in the specified directory
Check requirements.txt files if present
Generate a detailed report of findings

Chrome Extension

Load the extension in Chrome from the extension/ directory
Run the server.py file which sets up the server for scanning LLM generated code on LLM interfaces

# Navigate to src directory
cd src

# Run the server script
python3 -m api.server

Navigate to any LLM interface (ChatGPT, Claude, etc.)
The extension will automatically scan generated Python code for suspicious packages

Working

Scanning the requirements.txt file.

Extension scanning the results of ChatGPT generated code.

Future work

Coming up with other markers that could give more insights into if a package is malicious or not.
Currently only Python is supported. Must extend this support to other languages.

Developed by

Samkit Shah

Alvin Manoj Alex

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
docs/images		docs/images
extension		extension
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMPackageGuard

Project Structure

Components

Integrations into the scanner

Usage

Scanner

Chrome Extension

Working

Future work

Developed by

About

Uh oh!

Releases

Packages

Contributors 2

Languages

Samkit-shah/LLMPackageGuard

Folders and files

Latest commit

History

Repository files navigation

LLMPackageGuard

Project Structure

Components

Integrations into the scanner

Usage

Scanner

Chrome Extension

Working

Future work

Developed by

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages