A command-line tool that creates privacy-preserving "fingerprints" of IntelliJ plugins to detect duplicates and near-copies on the JetBrains Marketplace.
Think of it like a DNA test for plugins. It analyzes a plugin's structure (classes, methods, API usage) and creates a unique fingerprint. You can then compare two fingerprints to see if plugins are suspiciously similar—useful for catching plagiarism or renamed copies.
The fingerprints are privacy-friendly: class and method names get hashed so you can't reverse-engineer the original code. But structural patterns and API usage stay visible since they're already public information.
Screencast.webm
# Clone the repository and navigate to the project directory
git clone git@github.com:mirodilkamilov/codedna.git
cd codedna
# Build the project
./gradlew buildjava -jar build/libs/codedna-1.0-SNAPSHOT.jar extract test_plugins/intellij-csv-validator-4.1.0.zip -o plugin1-4.1.dna
java -jar build/libs/codedna-1.0-SNAPSHOT.jar extract test_plugins/intellij-csv-validator-4.0.2.zip -o plugin1-4.0.dna
java -jar build/libs/codedna-1.0-SNAPSHOT.jar extract test_plugins/dotenv-253.25908.13/lib/dotenv.jar -o plugin2.dnaWorks with both .jar files and .zip distributions. The tool automatically finds the main plugin JAR inside a ZIP.
# Compare two different plugins
java -jar build/libs/codedna-1.0-SNAPSHOT.jar compare plugin1-4.1.dna plugin2.dna
# Compare same plugins with different version
java -jar build/libs/codedna-1.0-SNAPSHOT.jar compare plugin1-4.1.dna plugin1-4.0.dnaYou'll get a similarity score and breakdown:
java -jar build/libs/codedna-1.0-SNAPSHOT.jar compare plugin1-4.1.dna plugin2.dna
🔍 Comparing fingerprints...
Plugin 1: plugin1-4.1.dna
Plugin 2: plugin2.dna
============================================================
SIMILARITY ANALYSIS
============================================================
Plugin 1: net.seesharpsoft.intellij.plugins.csv v4.1.0
Plugin 2: ru.adelf.idea.dotenv v253.25908.13
Overall Similarity: 13.47%
Verdict: ✓ DIFFERENT
[██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 🟢
------------------------------------------------------------
DETAILED SIMILARITY BREAKDOWN
------------------------------------------------------------
Class Shapes : 0.0% [░░░░░░░░░░░░░░░░░░░░]
Method Shapes : 0.0% [░░░░░░░░░░░░░░░░░░░░]
API Usage : 32.6% [██████░░░░░░░░░░░░░░]
Structural : 53.2% [██████████░░░░░░░░░░]
------------------------------------------------------------
RECOMMENDATION
------------------------------------------------------------
✓ Plugins are DISTINCT implementations.
• No similarity concerns
• Proceed with normal review process
============================================================The tool extracts:
Structural metrics (counts):
- Classes, methods, fields
- Interfaces, abstract classes, enums
- Package structure and depth
API usage (can't be faked):
- Which IntelliJ Platform APIs you call
- External libraries used
- Standard Java/Kotlin library usage
Code shapes (hashed for privacy):
- Class hierarchies and inheritance
- Method signatures (parameters, return types)
- All personally identifiable names are hashed
Here's what a fingerprint looks like before and after hashing. Please note that no plain hashing is stored. It's just shown as guidance:
The tool uses multiple signals to determine similarity:
-
API Usage (35% weight): The strongest signal. If two plugins use the exact same IntelliJ APIs in similar quantities, they're likely related.
-
Class Shapes (30% weight): Compares hashed class structures. Same inheritance patterns = suspicious.
-
Method Shapes (25% weight): Looks at method signatures. Identical method "shapes" across many classes = red flag.
-
Structural Metrics (10% weight): Basic size comparison. Less important since legitimate similar plugins can have similar counts.
- Hashing: SHA-256 (truncated to 16 chars for readability)
- Similarity algorithm: Jaccard similarity on hash sets + weighted scoring
- Privacy: Only plugin-specific names are hashed. Public APIs remain visible.
- Performance: Analyzes typical plugins in 1-2 seconds
- Current pairwise comparison doesn't scale to full Marketplace. Please consider using MinHash and LSH ( Locality-Sensitive Hashing) for large-scale comparisons across thousands of plugins
- Won't catch heavily obfuscated code (but rare in plugin ecosystem)
- Focused on structural similarity, not functional equivalence
- Designed for duplicate detection, not security analysis
This project was created as part of a JetBrains internship application (October 2025). Feel free to use it as reference or inspiration for similar projects.
MIT License - see LICENSE file for details.
