Skip to content

An error occurred during testing. #179

@wz1114841863

Description

@wz1114841863

I built a test code based on the example in KVZap's README documentation, but encountered an error during runtime. However, I was unable to pinpoint the issue.
My modified test code is as follows:

import requests
import os
from transformers import pipeline
from kvpress import KVzapPress, DMSPress


os.environ["TRANSFORMERS_OFFLINE"] = "1"

model = "Qwen/Qwen3-8B"
pipe = pipeline("kv-press-text-generation", model=model, device_map="auto", dtype="auto", local_files_only=True)
kvzap_press = KVzapPress(model_type="mlp")
press = DMSPress(kvzap_press, threshold=-4)

print(f"load successfullly")

press.decoding = False
context = (
    context
) = """
    This is an example article about machine learning. Machine learning is a subset of artificial intelligence
    that focuses on building systems that learn from data. Recent advances in deep learning have revolutionized
    many fields including computer vision, natural language processing, and speech recognition.
    Transformer models like BERT and GPT have shown remarkable performance on various NLP tasks.
    The field continues to evolve with new architectures and training techniques being developed regularly.

    In this paper, we introduce a novel approach to attention mechanisms that improves efficiency
    while maintaining performance. Our method reduces computational complexity from O(n^2) to O(n log n)
    for sequence length n. Experiments on benchmark datasets show competitive results with
    state-of-the-art models while using significantly less memory and computation time.
    """
question = "\n What is this article about in 2 sentences ?"
answer = pipe(context, question=question, press=press)["answer"]
print(f"Answer:{answer}")

press.decoding = True
prompt = "What is the best hardware to run LLMs and why ?"
answer = pipe(prompt, press=press, enable_thinking=True, max_new_tokens=2000)["answer"]
print(f"Answer:{answer}")

The error message is as follows:

Loading checkpoint shards: 100%|████████████████████████████████████| 5/5 [00:03<00:00,  1.55it/s]
Device set to use cuda:0
load successfullly
Answer:¾-var!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Traceback (most recent call last):
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/./test_kvpress.py", line 40, in <module>
    answer = pipe(prompt, press=press, enable_thinking=True, max_new_tokens=2000)["answer"]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/pipelines/base.py", line 1467, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/pipelines/base.py", line 1474, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/pipelines/base.py", line 1374, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/kvpress/pipeline.py", line 217, in _forward
    self.model.model(
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/torch/nn/mo                                                                                                            dules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/torch/nn/mo                                                                                                            dules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/utils/generic.py", line 1072, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/models/qwen3/modeling_qwen3.py", line 410, in forward
    hidden_states = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/torch/nn/mo                                                                                                            dules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/torch/nn/mo                                                                                                            dules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/accelerate/                                                                                                            hooks.py", line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/transformer                                                                                                            s/models/qwen3/modeling_qwen3.py", line 260, in forward
    hidden_states, _ = self.self_attn(
                       ^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/torch/nn/mo                                                                                                            dules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/torch/nn/mo                                                                                                            dules/module.py", line 1881, in _call_impl
    return inner()
           ^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/.venv/lib/python3.12/site-packages/torch/nn/mo                                                                                                            dules/module.py", line 1840, in inner
    hook_result = hook(self, args, kwargs, result)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/myserver/workplace1/zsy/zsy_2/kvpress/kvpress/presses/dms_press.py", line 93, in for                                                                                                            ward_hook
    self.scores_buffer[layer_idx] = torch.cat([self.scores_buffer[layer_idx], scores], dim=-1)

I have not modified any other files. Where might the problem be?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions