(NeurIPS 2025) Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

[时序人] [时空探索之旅] [QuantML] [时序之心] [时序大模型]

🔍 About | 🚀 Quick Start | 📊 Evaluation | 🔗 Citation

🔍About

This is the official repository for NeurIPS 2025 paper "Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting". This paper proposes DMMV, a novel decomposition-based multi-modal view (MMV) framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs of time series and large vision models (LVMs) for long-term time series forecasting (LTSF).

🔧Framework

Traditional time series forecasting models often rely on a single view (e.g., numerical, language, visual), overlooking the complementary information that can be integrated across different modalities. The proposed Decomposition-based Multi-Modal View (DMMV) framework addresses this limitation by jointly modeling the numerical and visual views of time series within a unified architecture.

As illustrated in Figure 1, DMMV consists of two variants, DMMV-S and DMMV-A.

DMMV-S: Uses a moving-average kernel to decompose the series into seasonal and trend parts, processed by the Visual and Numerical Forecasters, respectively.
DMMV-A: Leverages the Visual Forecaster for both forecasting and backcasting to reconstruct seasonal components, while adaptively using the Numerical Forecaster to model the residual trend component.

Both variants share the following core components:

Visual Forecaster: Utilizes a pre-trained LVM to reconstruct the masked regions of input imaged time series, effectively capturing periodic and local patterns.
Numerical Forecaster: A general series-to-series predictor that models global trends. It can be implemented as a linear-layer or Transformer-based forecaster that reads the numerical view of time series.
Fusion Gate: An adaptive gating mechanism that integrates the outputs from both forecasters, balancing trend and periodic information to produce the final forecast.


Figure 1: An overview of DMMV framework. (a) DMMV-S uses moving-average to extract trend and seasonal components. (b) DMMV-A uses a backcast-residual decomposition to automatically learn trend and seasonal components. In (b), the gray blocks are gray-scale images. "?" marks masks.

🔑 Key Features

Multi-Modal Integration: Jointly models numerical and visual views of time series while making use of the strengths of LVM forecasters and numerical forecasters.
Decomposition: Introduces a novel adaptive backcast–residual decomposition framework that can harness LVMs’ inductive biases.
Modular Compatibility: Supports various LVMs (e.g., MAE, SimMIM) and numerical forecasters (e.g., Linear, PatchTST) for flexible deployment.

🚀 Quick Start

Clone the Repository

git clone https://github.com/D2I-Group/dmmv.git
cd dmmv

Set Up the Environment

We recommend using conda or virtualenv to create an isolated environment.

python3 -m venv venv
source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt

Download the datasets
- You can obtain the well pre-processed datasets from Google Drive provided by Time-Series-Library.
- Then place the downloaded data in the folder ./dataset.
- Here is a summary of the benchmark datasets.
Run the Code
- Make sure the environment and settings are correctly configured.
- Run bash scripts/DMMV-A/ETTh1.sh to start.

📊 Evaluation

DMMV is comprehensively compared with 14 state-of-the-art (SOTA) models on 8 benchmark datasets across domains. The baseline methods cover different time series forecasting models, including LLM-, LVM-, VLM-, Transformer-, CNN-, and MLP-based methods. DMMV achieves the best mean squared error (MSE) on 6 out of 8 datasets. Figure 2 presents the ranking of DMMV and the baseline methods in terms of MSE and mean absolute error (MAE), providing an overview of DMMV's performance.


Figure 2: Critical difference (CD) diagram on the average rank of all 16 compared methods in terms of (a) MSE and (b) MAE over all benchmark datasets. The lower rank (left of the scale) is better.

🔗 Citation

@inproceedings{shen2025dmmv,
      title={Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting}, 
      author={ChengAo Shen and Wenchao Yu and Ziming Zhao and Dongjin Song and Wei Cheng and Haifeng Chen and Jingchao Ni},
      booktitle={NeurIPS},
      year={2025},
}

📧 Contact

If you have any questions or concerns, please contact us: cshen9 [at] uh [dot] edu or submit an issue

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Experiment		Experiment
data_provider		data_provider
image		image
models		models
modules		modules
scripts		scripts
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

(NeurIPS 2025) Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

🔍About

🔧Framework

🔑 Key Features

🚀 Quick Start

📊 Evaluation

🔗 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

D2I-Group/dmmv

Folders and files

Latest commit

History

Repository files navigation

(NeurIPS 2025) Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

🔍About

🔧Framework

🔑 Key Features

🚀 Quick Start

📊 Evaluation

🔗 Citation

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages