-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Background
To enable efficient deployment, we need to design a student model for knowledge distillation from the current teacher model. The student model must meet the following constraints:
- Parameter count < 10,000
- Target accuracy >98% (on the same evaluation metric as the teacher)
Requirements
- Propose and implement a student model architecture (can be CNN, MLP, LTC, or hybrid) with total parameters strictly less than 10,000.
- Integrate the full knowledge distillation pipeline: teacher inference, soft target KD loss, optional feature/attention distillation, and hard target loss.
- Provide parameter count calculation and verification script.
- Experiment with various student model designs (different channel/hidden sizes, number of layers, etc.) to achieve the best trade-off between size and accuracy.
- Document all architectural choices, hyperparameters, and training tricks used to reach >98% accuracy.
- Provide training logs, learning curves, and test accuracy.
- Discuss limitations, if any, and suggestions for further improvements.
Acceptance Criteria
- Student model Python code (<10,000 parameters, with calculation)
- Training script with knowledge distillation (can reuse teacher code as needed)
- Achieved >98% accuracy on the target dataset
- Documentation: architecture, parameter count, training details, and results
- (Optional) Visualization: confusion matrix, feature t-SNE, etc.
Notes
- If necessary, use Teacher Assistant (two-stage distillation) or FitNet/intermediate feature distillation for additional performance.
- Use strong regularization and data augmentation to help small models generalize.
- If you fail to reach the goal, document best effort and obstacles encountered.
Labels: enhancement, question
Metadata
Metadata
Assignees
Labels
No labels