Code and Pretrained Models for: evMLP: An Efficient Event-Driven MLP Architecture for Vision
This is a highly experimental implementation of evMLP. Training code, pre-trained models, and evaluation scripts will be updated in the near future.
Please refer to "requirements.txt". If you don't want to install dependencies according to "requirements.txt", torch>=2.0.0 is necessary, and it's better to install the latest versions of einops and thop to support operators and correctly calculate the cost.
You can train on ImageNet-1K by modifying the references/classification/train.py in torchvision.
To train the models in the paper (default configuration in evmlp.py), you can use the following settings (using 4 GPUs):
torchrun --nproc_per_node=4 \
train.py \
--auto-augment imagenet \
--label-smoothing 0.1 \
--random-erase=0.1 \
--mixup-alpha 0.2 \
--cutmix-alpha 1.0 \
--epochs 300 \
--batch-size 256 \
--opt sgd \
--lr 0.1 \
--momentum 0.9 \
--lr-scheduler cosineannealinglr \
--lr-min 0.00001 \
--lr-warmup-method=linear \
--lr-warmup-epochs=5 \
--workers 8 \
--wd 0.00001 \
--data-path /path/to/datasetHere are the pre-trained models:
Models trained under the old configuration (deprecated):
evmlp_b_224_imagenet1k.pth: Using the default configuration inevmlp.py, trained from scratch on ImageNet-1K.
Available models:
| Model Name | Input Size | MACs(G) | Params(M) | Top1-Acc | Epochs | Teacher Model |
|---|---|---|---|---|---|---|
| evmlp_224_imagenet1k_ep300_73.5.pth | 224x224 | 1.0 | 38.4 | 73.5% | 300 | - |
| evmlp_224_imagenet1k_ep300_distilled_75.4.pth | 224x224 | 1.0 | 38.4 | 75.4% | 300 | EfficientNetV2-S* |
| evmlp_224_imagenet1k_ep800_distilled_77.0.pth | 224x224 | 1.0 | 38.4 | 77.0% | 800 | EfficientNetV2-S* |
* EfficientNetV2-S, top1 81.31%@224x224, top1 84.23%@384x384
Process videos using eval_video_dir.py:
python eval_video_dir.py <weights.pth> <dir_path> <event_threshold>For example, download the model file evmlp_b_224_imagenet1k.pth, place the video files in /path/to/videos, and use an event threshold of 0.05:
python eval_video_dir.py evmlp_b_224_imagenet1k.pth /path/to/videos 0.05eval_video_dir.py uses opencv_python to load video files. The default filter list only supports video files with extensions .avi and .mp4. If necessary, you can edit the following code:
L31@eval_video_dir.py: video_extensions = {'.avi', '.mp4'}Q: Can evMLP be used for other computer vision tasks besides image classification?
A: Certainly. The feature maps reconstructed by evMLP through the rearrange operation can maintain the adjacency relationship between neuron patches relative to the input image, making it directly applicable to tasks such as object detection and segmentation. If I have time, I will update some examples of applying evMLP to other tasks.
Q: Why has the number of MACs decreased, but the execution time increased instead?
A: This repository only provides experimental Python code. If you understand that:
Code 1:
a = numpy.random.rand(N)
sum = 0.
for i in a:
sum += iCode 2:
a = numpy.random.rand(N)
sum = a.sum()Even though both codes sum the array a, the execution time of Code 2 might be significantly shorter than Code 1. For practical applications, the code can be implemented in C/C++. Alternatively, using FPGA for implementation is also a great option.