Skip to content

wyddmw/WiseAD

Repository files navigation

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

arxiv paper 🤗 HuggingFace models 🤗 HuggingFace datasets

Songyan Zhang1*, Wenhui Huang1*, Zihui Gao2, Hao Chen2, Lv Chen1†

Nanyang Technology University1, Zhejiang University2

*Equal Contributions, †Corresponding Author


An overview of the framework of our WiseAD.

✨Capabilities

An overview of the capability of our proposed WiseAD, a specialized vision-language model for end-to-end autonomous driving with extensive fundamental driving knowledge. Given a clip of the video sequence, our WiseAD is capable of answering various driving-related questions and performing knowledge-augmented trajectory planning according to the target waypoints.

🦙 Data & Model Zoo

Our WiseAD is built on the MobileVLM V2 1.7B and finetuned on a mixture of datasets including LingoQA, DRAMA, and Carla datasets, which can be downloaded via the related sites.
Our WiseAD is now available at huggingface. Enjoy playing with it!

🛠️ Install

  1. Clone this repository and navigate to MobileVLM folder

    git clone https://github.com/wyddmw/WiseAD.git
    cd WiseAD
  2. Install Package

    conda create -n wisead python=3.10 -y
    conda activate wisead
    pip install --upgrade pip
    pip install torch==2.0.1
    pip install -r requirements.txt

🗝️ Quick Start

Example of answering driving-related questions.

python run_infr.py

🪜 Training & Evaluation

Datasets

The datasets used to train WiseAD are as follows:

We provide our training data jsons on the huggingface. Note that for the DRAMA dataset, users are required to apply for the permission with an application email. The datasets are organized in the following structure:

data
├── carla
│   ├── DATASET
│   │   ├── routes_town01_long_w1...
│   │   └── routes_town01_long_w2...
│   └── carla_qa.json
├── DRAMA
│   ├── drama_data
│   │    ├── combined
│   │    │   ├── 2020-0127-132751
│   │    │   ├── 2020-0129-105040
│   │    │   └── ...
│   └── DRAMA_qa.json
├── LingoQA
│   ├── action
│   │   └── images
│   ├── evaluation
│   │   └── images
│   ├── scenery
│   │   └── images
│   ├── training_data.json
│   └── evaluation_data.json

It is recommended to symlink your dataset root to data:

Launch training with one click!

bash launch.sh

Evaluate on the LingoQA dataset.

sh eval/LingoQA/eval_lingoqa.sh /path/to/WiseAD/checkpoint /path/to/save/predictions
# An example: 
# sh eval/LingoQA/eval_lingoqa.sh wyddmw/WiseAD /home/spyder/WiseAD/eval_results

The predictions will be saved to the /path/to/save/predictions/LingoQA_results.json and obtain the Lingo-Judge metric.

🔨 TODO LIST

  • [✓] Release hugging face model and inference demo.
  • [✓] Training data and code (coming soon).
  • Carla closed-loop evaluation (coming soon).

Reference

We appeciate the awesome open-source projects of MobileVLM and LMDrive.

✏️ Citation

If you find WiseAD is useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@article{zhang2024wisead,
  title={WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model},
  author={Zhang, Songyan and Huang, Wenhui and Gao, Zihui and Chen, Hao and Lv, Chen},
  journal={arXiv preprint arXiv:2412.09951},
  year={2024}
}

About

This is the official implementation of WiseAD.

Resources

Stars

Watchers

Forks