Design a Student Model with <10,000 Parameters for Knowledge Distillation (Accuracy >98%)

### Background
To enable efficient deployment, we need to design a student model for knowledge distillation from the current teacher model. The student model must meet the following constraints:
- **Parameter count < 10,000**
- **Target accuracy >98%** (on the same evaluation metric as the teacher)

### Requirements
- Propose and implement a student model architecture (can be CNN, MLP, LTC, or hybrid) with total parameters strictly less than 10,000.
- Integrate the full knowledge distillation pipeline: teacher inference, soft target KD loss, optional feature/attention distillation, and hard target loss.
- Provide parameter count calculation and verification script.
- Experiment with various student model designs (different channel/hidden sizes, number of layers, etc.) to achieve the best trade-off between size and accuracy.
- **Document all architectural choices, hyperparameters, and training tricks used to reach >98% accuracy.**
- Provide training logs, learning curves, and test accuracy.
- Discuss limitations, if any, and suggestions for further improvements.

### Acceptance Criteria
- [ ] Student model Python code (<10,000 parameters, with calculation)
- [ ] Training script with knowledge distillation (can reuse teacher code as needed)
- [ ] Achieved >98% accuracy on the target dataset
- [ ] Documentation: architecture, parameter count, training details, and results
- [ ] (Optional) Visualization: confusion matrix, feature t-SNE, etc.

### Notes
- If necessary, use Teacher Assistant (two-stage distillation) or FitNet/intermediate feature distillation for additional performance.
- Use strong regularization and data augmentation to help small models generalize.
- If you fail to reach the goal, document best effort and obstacles encountered.

---
**Labels:** enhancement, question


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design a Student Model with <10,000 Parameters for Knowledge Distillation (Accuracy >98%) #3

Background

Requirements

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design a Student Model with <10,000 Parameters for Knowledge Distillation (Accuracy >98%) #3

Description

Background

Requirements

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions