the question about LLM inference performance

Thank you for providing such outstanding research!

I tested the llama7b model, and after pruning, both the memory usage and inference speed are not significantly different from the original model. May I ask if you mentioned any methods to accelerate inference for pruned models?

```
GPU:NVIDIA A6000
torch 2.2.0
transformers 4.31.0
accelerate 0.21.0
```