AutoPrune [NeurIPS 2025]

⚡️ AutoPrune: Each Complexity Deserves a Pruning Policy

Hanshi Wang, Yuhao Xu, Zekun Xu, Jin Gao, Weiming Hu, Zhipeng Zhang

^*This work was completed during Hanshi’s remote internship at AutoLab, SJTU.

🔥 News

[2025.9.18] AutoPrune is accepted by NeurIPS 2025.

👁️ Overview

The established redundancy in visual tokens within large vision–language models (LVLMs) allows for pruning to effectively reduce their substantial computational demands. Empirical evidence from previous works indicates that visual tokens in later decoder stages receive less attention than shallow layers. Then, previous methods typically employ heuristics layer-specific pruning strategies where, although the number of tokens removed may differ across decoder layers, the overall pruning schedule is fixed and applied uniformly to all input samples and tasks, failing to align token elimination with the model’s holistic reasoning trajectory. Cognitive science indicates that human visual processing often begins with broad exploration to accumulate evidence before narrowing focus as the target becomes distinct. Our experiments reveal an analogous pattern in LVLMs. This observation strongly suggests that neither a fixed pruning schedule nor a heuristics layer-wise strategy can optimally accommodate the diverse complexities inherent in different inputs. To overcome this limitation, we introduce Complexity-Adaptive Pruning (AutoPrune), which is a training-free, plug-and-play framework that tailors pruning policies to varying sample and task complexities. Specifically, AutoPrune quantifies the mutual information between visual and textual tokens, and then projects this signal to a budget-constrained logistic retention curve. Each such logistic curve, defined by its unique shape, is shown to effectively correspond with the specific complexity of different tasks, and can easily guarantee adherence to a pre-defined computational constraints. We evaluate AutoPrune not only on standard vision-language tasks but also on Vision-Language-Action (VLA) models for autonomous driving. Notably, when applied to LLaVA-1.5-7B, our method prunes 89% of visual tokens and reduces inference FLOPs by 76.8%, but still retaining 96.7% of the original accuracy averaged over all tasks. This corresponds to a 9.1% improvement over the recent work PDrop (CVPR'2025), demonstrating the effectivenes.

⚙️ Installation

🏝️ Environment

Clone this repository.

https://github.com/AutoLab-SAI-SJTU/AutoPrune.git
cd AutoPrune

Install necessary packages.

conda create -n AutoPrune python=3.10 -y
conda activate AutoPrune
pip install -e .

(Optional) Install FlashAttention for further inference acceleration.

pip install flash-attn --no-build-isolation

📦️ Model

Download corresponding LLaVA checkpoints from Hugging Face 🤗:

Version	LLM	Checkpoint
LLaVA-1.5	Vicuna-7B	liuhaotian/llava-v1.5-7b
LLaVA-1.5	Vicuna-13B	liuhaotian/llava-v1.5-13b
LLaVA-1.6 (LLaVA-NeXT)	Vicuna-7B	liuhaotian/llava-v1.6-vicuna-7b
LLaVA-1.6 (LLaVA-NeXT)	Vicuna-13B	liuhaotian/llava-v1.6-vicuna-13b

📊 Data

Download each dataset according to EVAL.md.

📋️ Evaluation

Using TextVQA as an example (scripts/v1_5/eval/textvqa.sh), inference is controlled by a few hyperparameters that shape the visual‑token retention curve:

--visual-token-num: Initial number of visual tokens produced by the vision tower (LLaVA‑1.5: 576; LLaVA‑1.6: 2880). This is an upper bound; pruning will dynamically reduce it.
--target-token-num: Target visual‑token budget. In the scripts, the first positional argument TOKEN is passed here. Smaller values prune more aggressively; larger values keep more tokens.
--x0: Horizontal shift of the logistic retention curve. Increasing x0 delays strong pruning to later layers (keeping more tokens early); decreasing it starts shrinking earlier.
--k0 and --gamma: Control the MI‑adaptive slope of the curve.
- Internally we compute dynamic_k = max(-gamma * MI + k0, 0) and use it as the slope of the logistic curve.
- Intuition: k0 sets the base steepness (larger → sharper), while gamma controls sensitivity to sample complexity (mutual information; larger → more sensitive).

How to run (direct Python invocation is equivalent to the shell script):

python -W ignore -m llava.eval.model_vqa_loader \
    --model-path ./models/llava-v1.5-7b \
    --question-file ./playground/data/eval/textvqa/llava_textvqa_val_v051_ocr.jsonl \
    --image-folder ./playground/data/eval/textvqa/train_images \
    --answers-file "${OUT_JSONL}" \
    --visual-token-num 576 \
    --temperature 0 \
    --conv-mode vicuna_v1 \
    --x0 14.9 \
    --k0 0.4 \
    --gamma 0.2 \
    --target-token-num ${TOKEN}

CUDA_VISIBLE_DEVICES=2 bash scripts/v1_5/eval/textvqa.sh 64

The trailing 64 is the TOKEN argument; the script forwards it to --target-token-num as your visual‑token budget (smaller → more pruning).
v1_5 scripts fix --visual-token-num 576; v1_6 scripts fix --visual-token-num 2880.

Tuning tip: Optimal settings may vary by dataset/task. You can tuning --x0 / --k0 / --gamma per dataset for the best results. In our method we did not perform fine-grained hyperparameter tuning in order to demonstrate robustness, so with proper tuning it is likely to surpass the results reported in our paper.

🏆Main Results

🎗️ Citation

If you find AutoPrune useful for your research and applications, please cite using this BibTeX:

@article{wang2025autoprune,
  title={Each Complexity Deserves a Pruning Policy},
  author={Hanshi Wang, Yuhao Xu, Zekun Xu, Jin Gao, Yufan Liu, Weiming Hu, Ke Wang, Zhipeng Zhang},
  journal={Advances in Neural Information Processing Systems},
  year={2025}
}

🎟️ License

This project is released under the Apache 2.0 license.

🎉 Acknowledgement

AutoPrune uses code from a few open source repositories. Without the efforts of these folks (and their willingness to release their implementations), AutoPrune would not be possible. We thanks these authors for their efforts!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
llava		llava
playground/data		playground/data
scripts		scripts
.gitignore		.gitignore
EVAL.md		EVAL.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoPrune [NeurIPS 2025]

⚡️ AutoPrune: Each Complexity Deserves a Pruning Policy

🔥 News

👁️ Overview

⚙️ Installation

🏝️ Environment

📦️ Model

📊 Data

📋️ Evaluation

🏆Main Results

🎗️ Citation

🎟️ License

🎉 Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

AutoLab-SAI-SJTU/AutoPrune

Folders and files

Latest commit

History

Repository files navigation

AutoPrune [NeurIPS 2025]

⚡️ AutoPrune: Each Complexity Deserves a Pruning Policy

🔥 News

👁️ Overview

⚙️ Installation

🏝️ Environment

📦️ Model

📊 Data

📋️ Evaluation

🏆Main Results

🎗️ Citation

🎟️ License

🎉 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages