Skip to content

Significant Performance Gap Between Reproduced Results and Reported Benchmarks #10

@YafengWu

Description

@YafengWu

I successfully downloaded the official HuggingFace model weights and deployed the model using vLLM on my local server for evaluation. I then ran the provided main.py script to benchmark performance on the listed datasets.

Although the implementation appears correct, my results differ substantially from those reported in the paper, especially on the ETT datasets. For example, my average MSE on ETTm1 is around 33.9, while the paper reports 13.1.

I have verified that the evaluation script correctly averages results across all windows and variables. Could you please advise whether I might be missing any critical evaluation or preprocessing steps?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions