Skip to content

Detects duplicate IntelliJ plugins with bytecode-based fingerprint analyzer

License

Notifications You must be signed in to change notification settings

mirodilkamilov/codedna

Repository files navigation

Code DNA

A command-line tool that creates privacy-preserving "fingerprints" of IntelliJ plugins to detect duplicates and near-copies on the JetBrains Marketplace.

What it does

Think of it like a DNA test for plugins. It analyzes a plugin's structure (classes, methods, API usage) and creates a unique fingerprint. You can then compare two fingerprints to see if plugins are suspiciously similar—useful for catching plagiarism or renamed copies.

The fingerprints are privacy-friendly: class and method names get hashed so you can't reverse-engineer the original code. But structural patterns and API usage stay visible since they're already public information.

Demo

Screencast.webm

How to use it

0. Quick setup

# Clone the repository and navigate to the project directory
git clone git@github.com:mirodilkamilov/codedna.git
cd codedna

# Build the project
./gradlew build

1. Extract fingerprints from plugins

java -jar build/libs/codedna-1.0-SNAPSHOT.jar extract test_plugins/intellij-csv-validator-4.1.0.zip -o plugin1-4.1.dna
java -jar build/libs/codedna-1.0-SNAPSHOT.jar extract test_plugins/intellij-csv-validator-4.0.2.zip -o plugin1-4.0.dna
java -jar build/libs/codedna-1.0-SNAPSHOT.jar extract test_plugins/dotenv-253.25908.13/lib/dotenv.jar -o plugin2.dna

Works with both .jar files and .zip distributions. The tool automatically finds the main plugin JAR inside a ZIP.

2. Compare two fingerprints

# Compare two different plugins
java -jar build/libs/codedna-1.0-SNAPSHOT.jar compare plugin1-4.1.dna plugin2.dna

# Compare same plugins with different version
java -jar build/libs/codedna-1.0-SNAPSHOT.jar compare plugin1-4.1.dna plugin1-4.0.dna

You'll get a similarity score and breakdown:

java -jar build/libs/codedna-1.0-SNAPSHOT.jar compare plugin1-4.1.dna plugin2.dna
🔍 Comparing fingerprints...
  Plugin 1: plugin1-4.1.dna
  Plugin 2: plugin2.dna

============================================================
SIMILARITY ANALYSIS
============================================================

Plugin 1: net.seesharpsoft.intellij.plugins.csv v4.1.0
Plugin 2: ru.adelf.idea.dotenv v253.25908.13

Overall Similarity: 13.47%
Verdict: ✓ DIFFERENT

  [██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 🟢
------------------------------------------------------------
DETAILED SIMILARITY BREAKDOWN
------------------------------------------------------------

  Class Shapes      :   0.0% [░░░░░░░░░░░░░░░░░░░░]
  Method Shapes     :   0.0% [░░░░░░░░░░░░░░░░░░░░]
  API Usage         :  32.6% [██████░░░░░░░░░░░░░░]
  Structural        :  53.2% [██████████░░░░░░░░░░]

------------------------------------------------------------
RECOMMENDATION
------------------------------------------------------------

✓ Plugins are DISTINCT implementations.

  • No similarity concerns
  • Proceed with normal review process

============================================================

What gets analyzed

The tool extracts:

Structural metrics (counts):

  • Classes, methods, fields
  • Interfaces, abstract classes, enums
  • Package structure and depth

API usage (can't be faked):

  • Which IntelliJ Platform APIs you call
  • External libraries used
  • Standard Java/Kotlin library usage

Code shapes (hashed for privacy):

  • Class hierarchies and inheritance
  • Method signatures (parameters, return types)
  • All personally identifiable names are hashed

Example output

Here's what a fingerprint looks like before and after hashing. Please note that no plain hashing is stored. It's just shown as guidance:

fingerprint structure

How comparison works

The tool uses multiple signals to determine similarity:

  1. API Usage (35% weight): The strongest signal. If two plugins use the exact same IntelliJ APIs in similar quantities, they're likely related.

  2. Class Shapes (30% weight): Compares hashed class structures. Same inheritance patterns = suspicious.

  3. Method Shapes (25% weight): Looks at method signatures. Identical method "shapes" across many classes = red flag.

  4. Structural Metrics (10% weight): Basic size comparison. Less important since legitimate similar plugins can have similar counts.

Technical details

  • Hashing: SHA-256 (truncated to 16 chars for readability)
  • Similarity algorithm: Jaccard similarity on hash sets + weighted scoring
  • Privacy: Only plugin-specific names are hashed. Public APIs remain visible.
  • Performance: Analyzes typical plugins in 1-2 seconds

Limitations

  • Current pairwise comparison doesn't scale to full Marketplace. Please consider using MinHash and LSH ( Locality-Sensitive Hashing) for large-scale comparisons across thousands of plugins
  • Won't catch heavily obfuscated code (but rare in plugin ecosystem)
  • Focused on structural similarity, not functional equivalence
  • Designed for duplicate detection, not security analysis

License

This project was created as part of a JetBrains internship application (October 2025). Feel free to use it as reference or inspiration for similar projects.

MIT License - see LICENSE file for details.

About

Detects duplicate IntelliJ plugins with bytecode-based fingerprint analyzer

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages