This project implements a comprehensive comparison between classical and quantum machine learning approaches for hand-drawn sketch (doodle) classification. The goal is to evaluate the performance, efficiency, and generalization capabilities of both paradigms on the same feature set extracted from stroke-based drawings.
Can quantum machine learning algorithms provide advantages over classical approaches for sketch recognition tasks when using identical handcrafted feature representations?
Each data sample is stored as a JSON file with the following structure:
{
"session": 1663053145814,
"student": "Radu",
"drawings": {
"car": [
[ [x1, y1], [x2, y2], ... ], // stroke 1
[ [x1, y1], [x2, y2], ... ], // stroke 2
// ... more strokes
]
}
}[
{
"id": 1663053145814,
"label": "car",
"vector": [x1, y1, x2, y2, ..., xn, yn]
}
][
{
"id": 1663053145814,
"label": "car",
"sa_vector": [ [x, y, stroke_id], ... ],
"no_strokes": 3,
"no_points": 45
}
][
{
"id": ...,
"label": ...,
"features": {
"no_strokes": 3,
"avg_stroke_length": 45.2,
"bbox_width": 120.5,
// ... more features
}
}
]The following handcrafted features are computed for each doodle:
| Feature Category | Features | Description |
|---|---|---|
| Stroke Properties | no_strokes, no_points, avg_stroke_length |
Basic stroke statistics |
| Geometric Properties | bbox_width, bbox_height, aspect_ratio |
Bounding box characteristics |
| Spatial Features | centroid_x, centroid_y, start_x, start_y, end_x, end_y |
Position information |
| Shape Complexity | compactness, convex_hull_area, area, perimeter |
Shape complexity metrics |
| Symmetry | horizontal_symmetry, vertical_symmetry |
Symmetry measures |
| Stroke Quality | straightness |
Ratio of Euclidean to path length |
- Stroke Analysis: Extract individual strokes from coordinate sequences
- Geometric Calculations: Compute bounding boxes, centroids, areas
- Symmetry Detection: Analyze horizontal and vertical symmetry
- Quality Metrics: Calculate compactness, straightness ratios
- Normalization: StandardScaler for feature scaling
QC_Project/
โโโ README.md # This file
โโโ LICENSE # MIT License
โโโ practise.py # Experimentation notebook
โโโ clustering_outlier_removal.py # Data cleaning pipeline
โ
โโโ data/ # Raw JSON doodle files
โ โโโ 1663053145814.json
โ โโโ 1663307917621.json
โ โโโ ... (100+ doodle files)
โ
โโโ Classical/ # Classical ML Pipeline
โ โโโ data_formating.py # Data preprocessing & vectorization
โ โโโ Feature_Extraction.py # Handcrafted feature computation
โ โโโ ML.ipynb # Classical ML experiments
โ โโโ model_results.csv # Performance metrics
โ โโโ Image_data/ # Stroke visualizations
โ โ โโโ car/
โ โ โโโ house/
โ โ โโโ ... (class folders)
โ โโโ processed_data/ # Intermediate data files
โ โโโ vectorized_data.json
โ โโโ stroke_vectors.json
โ โโโ feature_vectors.json
โ โโโ clean_features.csv
โ
โโโ Quantum/ # Quantum ML Pipeline
โ โโโ encodings.py # Quantum encoding methods
โ
โโโ Dataset_Collection/ # Data collection utilities
โโโ Lab_Class/ # Lab experiments
โโโ Papers/ # Research references
- JSON Loading: Batch processing of doodle files
- Vectorization: Convert strokes to coordinate vectors
- Stroke-Aware Processing: Maintain stroke identity information
- Image Generation: Create PNG visualizations of doodles
- MongoDB Integration: Store/retrieve processed features
- Comprehensive Logging: Track feature extraction progress
- Error Handling: Robust processing of malformed data
- Geometric Computations: Advanced shape analysis algorithms
- Data Preprocessing: StandardScaler normalization, label encoding
- Model Training: KNN, Random Forest, SVM, Neural Networks
- Evaluation: Cross-validation, confusion matrices, classification reports
- Visualization: PCA, t-SNE dimensionality reduction
- Outlier Detection: Isolation Forest, Local Outlier Factor, Statistical methods
- Class-wise Clustering: Understand intra-class data structure
- Quality Assessment: Silhouette scores, cluster analysis
- Automated Filtering: Remove problematic samples
- Amplitude Encoding: Normalize feature vectors as quantum amplitudes
- Data Preprocessing: StandardScaler + LabelEncoder compatibility
- MongoDB Integration: Load filtered features for quantum processing
- Qubit Calculation: Automatic qubit requirement computation
- Quantum KNN (QKNN): Distance-based classification in Hilbert space
- Variational Quantum Classifier (VQC): Parameterized quantum circuits
- Quantum Support Vector Machine (QSVM): Kernel methods in quantum space
# Core dependencies
pip install numpy pandas scikit-learn matplotlib seaborn
pip install pymongo pillow scipy
# Quantum computing
pip install qiskit qiskit-aer qiskit-machine-learning
# Optional: For advanced visualizations
pip install plotly dash# Install and start MongoDB
brew install mongodb-community
brew services start mongodb-community
# Create database and collections
mongosh
> use Doodle_Classifier
> db.createCollection("Extracted_Features")
> db.createCollection("Filtered_Features")# Process raw JSON files into structured formats
python Classical/data_formating.py# Extract handcrafted features and store in MongoDB
python Classical/Feature_Extraction.py# Remove outliers and prepare clean dataset
python clustering_outlier_removal.py# Open and run the Jupyter notebook
jupyter notebook Classical/ML.ipynb# Prepare quantum encodings
python Quantum/encodings.py- Accuracy: Classification accuracy on test set
- Precision/Recall: Per-class performance metrics
- F1-Score: Harmonic mean of precision and recall
- Confusion Matrix: Detailed classification breakdown
- Cross-Validation: Robust performance estimation
- Same Feature Set: Both classical and quantum models use identical features
- Consistent Preprocessing: Same normalization and encoding pipeline
- Fair Evaluation: Identical train/test splits and evaluation metrics
- Computational Complexity: Training time and resource usage comparison
- Classical Baseline: Establish strong classical performance benchmarks
- Quantum Advantage: Identify scenarios where quantum algorithms excel
- Feature Importance: Understand which features benefit most from quantum processing
- Scalability Analysis: Performance trends with dataset size and feature dimensionality
- Complete Quantum Implementation: Finish QKNN and VQC algorithms
- Hyperparameter Optimization: Grid search for both classical and quantum models
- Feature Selection: Identify optimal feature subsets for quantum advantage
- Noise Analysis: Study robustness to quantum hardware noise
- Hybrid Classical-Quantum Models: Combine both paradigms
- Real Quantum Hardware: Experiments on IBM Quantum, IonQ platforms
- Scalability Studies: Performance with larger datasets and feature spaces
- Novel Quantum Features: Quantum-native feature extraction methods
- Quantum Machine Learning - Biamonte et al. (2017)
- Quantum algorithms for supervised and unsupervised machine learning - Lloyd et al. (2014)
- The quest for a Quantum Neural Network - Kwek et al. (2021)
- Quick, Draw! Dataset (Google)
- Custom Doodle Collection (This Project)
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Quantum Computing Community: For open-source quantum frameworks
- Classical ML Libraries: Scikit-learn, NumPy, Pandas ecosystems
- Data Contributors: Students who provided doodle samples
- Research Inspiration: Academic papers on quantum machine learning
Author: Rithvik Rajesh
Repository:
classical-quantum-sketch-ml
Issues: Please use GitHub Issues for bug reports and feature requests
This project represents an exploration into the frontier of quantum machine learning, comparing traditional and quantum approaches on a practical computer vision task. The goal is to contribute to our understanding of when and where quantum algorithms may provide advantages in machine learning applications.