WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

Songyan Zhang^1*, Wenhui Huang^1*, Zihui Gao², Hao Chen², Lv Chen^1†

Nanyang Technology University¹, Zhejiang University²

*Equal Contributions, †Corresponding Author

An overview of the framework of our WiseAD.

✨Capabilities

An overview of the capability of our proposed WiseAD, a specialized vision-language model for end-to-end autonomous driving with extensive fundamental driving knowledge. Given a clip of the video sequence, our WiseAD is capable of answering various driving-related questions and performing knowledge-augmented trajectory planning according to the target waypoints.

🦙 Data & Model Zoo

Our WiseAD is built on the MobileVLM V2 1.7B and finetuned on a mixture of datasets including LingoQA, DRAMA, and Carla datasets, which can be downloaded via the related sites.
Our WiseAD is now available at huggingface. Enjoy playing with it!

🛠️ Install

Clone this repository and navigate to MobileVLM folder

git clone https://github.com/wyddmw/WiseAD.git
cd WiseAD

Install Package

conda create -n wisead python=3.10 -y
conda activate wisead
pip install --upgrade pip
pip install torch==2.0.1
pip install -r requirements.txt

🗝️ Quick Start

Example of answering driving-related questions.

python run_infr.py

🪜 Training & Evaluation

Datasets

The datasets used to train WiseAD are as follows:

We provide our training data jsons on the huggingface. Note that for the DRAMA dataset, users are required to apply for the permission with an application email. The datasets are organized in the following structure:

data
├── carla
│   ├── DATASET
│   │   ├── routes_town01_long_w1...
│   │   └── routes_town01_long_w2...
│   └── carla_qa.json
├── DRAMA
│   ├── drama_data
│   │    ├── combined
│   │    │   ├── 2020-0127-132751
│   │    │   ├── 2020-0129-105040
│   │    │   └── ...
│   └── DRAMA_qa.json
├── LingoQA
│   ├── action
│   │   └── images
│   ├── evaluation
│   │   └── images
│   ├── scenery
│   │   └── images
│   ├── training_data.json
│   └── evaluation_data.json

It is recommended to symlink your dataset root to data:

Launch training with one click!

bash launch.sh

Evaluate on the LingoQA dataset.

sh eval/LingoQA/eval_lingoqa.sh /path/to/WiseAD/checkpoint /path/to/save/predictions
# An example: 
# sh eval/LingoQA/eval_lingoqa.sh wyddmw/WiseAD /home/spyder/WiseAD/eval_results

The predictions will be saved to the /path/to/save/predictions/LingoQA_results.json and obtain the Lingo-Judge metric.

🔨 TODO LIST

[✓] Release hugging face model and inference demo.
[✓] Training data and code (coming soon).
Carla closed-loop evaluation (coming soon).

Reference

We appeciate the awesome open-source projects of MobileVLM and LMDrive.

✏️ Citation

If you find WiseAD is useful in your research or applications, please consider giving a star ⭐ and citing using the following BibTeX:

@article{zhang2024wisead,
  title={WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model},
  author={Zhang, Songyan and Huang, Wenhui and Gao, Zihui and Chen, Hao and Lv, Chen},
  journal={arXiv preprint arXiv:2412.09951},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
demo		demo
eval		eval
mobilevlm		mobilevlm
scripts		scripts
.gitignore		.gitignore
README.md		README.md
launch.sh		launch.sh
requirements.txt		requirements.txt
run_infer.py		run_infer.py
run_inference.py		run_inference.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

✨Capabilities

🦙 Data & Model Zoo

🛠️ Install

🗝️ Quick Start

Example of answering driving-related questions.

🪜 Training & Evaluation

Datasets

Launch training with one click!

Evaluate on the LingoQA dataset.

🔨 TODO LIST

Reference

✏️ Citation

About

Uh oh!

Uh oh!

Languages

wyddmw/WiseAD

Folders and files

Latest commit

History

Repository files navigation

WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model

✨Capabilities

🦙 Data & Model Zoo

🛠️ Install

🗝️ Quick Start

Example of answering driving-related questions.

🪜 Training & Evaluation

Datasets

Launch training with one click!

Evaluate on the LingoQA dataset.

🔨 TODO LIST

Reference

✏️ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages