🔍 About | 🚀 Quick Start | 📊 Evaluation | 🔗 Citation
This is the official repository for NeurIPS 2025 paper "Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting". This paper proposes DMMV, a novel decomposition-based multi-modal view (MMV) framework that leverages trend-seasonal decomposition and a novel backcast-residual based adaptive decomposition to integrate MMVs of time series and large vision models (LVMs) for long-term time series forecasting (LTSF).
Traditional time series forecasting models often rely on a single view (e.g., numerical, language, visual), overlooking the complementary information that can be integrated across different modalities. The proposed Decomposition-based Multi-Modal View (DMMV) framework addresses this limitation by jointly modeling the numerical and visual views of time series within a unified architecture.
As illustrated in Figure 1, DMMV consists of two variants, DMMV-S and DMMV-A.
- DMMV-S: Uses a moving-average kernel to decompose the series into seasonal and trend parts, processed by the Visual and Numerical Forecasters, respectively.
- DMMV-A: Leverages the Visual Forecaster for both forecasting and backcasting to reconstruct seasonal components, while adaptively using the Numerical Forecaster to model the residual trend component.
Both variants share the following core components:
- Visual Forecaster: Utilizes a pre-trained LVM to reconstruct the masked regions of input imaged time series, effectively capturing periodic and local patterns.
- Numerical Forecaster: A general series-to-series predictor that models global trends. It can be implemented as a linear-layer or Transformer-based forecaster that reads the numerical view of time series.
- Fusion Gate: An adaptive gating mechanism that integrates the outputs from both forecasters, balancing trend and periodic information to produce the final forecast.
- Multi-Modal Integration: Jointly models numerical and visual views of time series while making use of the strengths of LVM forecasters and numerical forecasters.
- Decomposition: Introduces a novel adaptive backcast–residual decomposition framework that can harness LVMs’ inductive biases.
- Modular Compatibility: Supports various LVMs (e.g., MAE, SimMIM) and numerical forecasters (e.g., Linear, PatchTST) for flexible deployment.
-
Clone the Repository
git clone https://github.com/D2I-Group/dmmv.git cd dmmv -
Set Up the Environment
- We recommend using
condaorvirtualenvto create an isolated environment.
python3 -m venv venv source venv/bin/activate # or .\venv\Scripts\activate on Windows pip install -r requirements.txt
- We recommend using
-
Download the datasets
- You can obtain the well pre-processed datasets from Google Drive provided by Time-Series-Library.
- Then place the downloaded data in the folder
./dataset. - Here is a summary of the benchmark datasets.
-
Run the Code
- Make sure the environment and settings are correctly configured.
- Run
bash scripts/DMMV-A/ETTh1.shto start.
DMMV is comprehensively compared with 14 state-of-the-art (SOTA) models on 8 benchmark datasets across domains. The baseline methods cover different time series forecasting models, including LLM-, LVM-, VLM-, Transformer-, CNN-, and MLP-based methods. DMMV achieves the best mean squared error (MSE) on 6 out of 8 datasets. Figure 2 presents the ranking of DMMV and the baseline methods in terms of MSE and mean absolute error (MAE), providing an overview of DMMV's performance.
![]() |
|---|
| Figure 2: Critical difference (CD) diagram on the average rank of all 16 compared methods in terms of (a) MSE and (b) MAE over all benchmark datasets. The lower rank (left of the scale) is better. |
@inproceedings{shen2025dmmv,
title={Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting},
author={ChengAo Shen and Wenchao Yu and Ziming Zhao and Dongjin Song and Wei Cheng and Haifeng Chen and Jingchao Ni},
booktitle={NeurIPS},
year={2025},
}
If you have any questions or concerns, please contact us: cshen9 [at] uh [dot] edu or submit an issue



