DriveCode: Domain Specific Numerical Encoding for LLM-Based Autonomous Driving
- URL: http://arxiv.org/abs/2603.00919v1
- Date: Sun, 01 Mar 2026 04:41:29 GMT
- Title: DriveCode: Domain Specific Numerical Encoding for LLM-Based Autonomous Driving
- Authors: Zhiye Wang, Yanbo Jiang, Rui Zhou, Bo Zhang, Fang Zhang, Zhenhua Xu, Yaqin Zhang, Jianqiang Wang,
- Abstract summary: We introduce DriveCode, a numerical encoding method that represents numbers as dedicated embeddings rather than discrete text tokens.<n>DriveCode employs a number projector to map numbers into the language model's hidden space, enabling seamless integration with visual and textual features.
- Score: 24.947943628933036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have shown great promise for autonomous driving. However, discretizing numbers into tokens limits precise numerical reasoning, fails to reflect the positional significance of digits in the training objective, and makes it difficult to achieve both decoding efficiency and numerical precision. These limitations affect both the processing of sensor measurements and the generation of precise control commands, creating a fundamental barrier for deploying LLM-based autonomous driving systems. In this paper, we introduce DriveCode, a novel numerical encoding method that represents numbers as dedicated embeddings rather than discrete text tokens. DriveCode employs a number projector to map numbers into the language model's hidden space, enabling seamless integration with visual and textual features in a unified multimodal sequence. Evaluated on OmniDrive, DriveGPT4, and DriveGPT4-V2 datasets, DriveCode demonstrates superior performance in trajectory prediction and control signal generation, confirming its effectiveness for LLM-based autonomous driving systems.
Related papers
- LAD-Drive: Bridging Language and Trajectory with Action-Aware Diffusion Transformers [15.4994260281059]
We introduce LAD-Drive, a generative framework that disentangles high-level intention from low-level spatial planning.<n>LAD-Drive employs an action decoder to infer a probabilistic meta-action distribution, establishing an explicit belief state that preserves the nuanced intent typically lost by one-hot encodings.<n>Extensive evaluations on the LangAuto benchmark demonstrate that LAD-Drive achieves state-of-the-art results, outperforming competitive baselines by up to 59% in Driving Score.
arXiv Detail & Related papers (2026-03-02T16:21:42Z) - SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving [38.21244888074097]
SpaceDrive is a spatial-aware driving framework that treats spatial information as explicit positional encodings (PEs) instead of textual digit tokens.<n>We show that SpaceDrive achieves state-of-the-art open-loop performance on the nuScenes dataset and the second-best Driving Score of 78.02 on the Bench2Drive benchmark.
arXiv Detail & Related papers (2025-12-11T14:59:07Z) - VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving [90.21844353859454]
We introduce a novel approach featuring a lightweight MLLM architecture with enhanced vision components.<n>VLDrive achieves state-of-the-art driving performance while reducing parameters by 81%.
arXiv Detail & Related papers (2025-11-09T07:14:53Z) - TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving [10.439455144126617]
TinyDrive is a lightweight VLM for multi-view VQA in driving scenarios.<n>Our model comprises two key components including a multiscale vision encoder and a dual-level prioritization mechanism for tokens and sequences.<n>TinyDrive is first evaluated on our custom-curated VQA dataset, and it is subsequently tested on the public DriveLM benchmark.
arXiv Detail & Related papers (2025-05-21T14:19:24Z) - SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models [63.71984266104757]
We propose SafeAuto, a framework that enhances MLLM-based autonomous driving by incorporating both unstructured and structured knowledge.<n>To explicitly integrate safety knowledge, we develop a reasoning component that translates traffic rules into first-order logic.<n>Our Multimodal Retrieval-Augmented Generation model leverages video, control signals, and environmental attributes to learn from past driving experiences.
arXiv Detail & Related papers (2025-02-28T21:53:47Z) - Doe-1: Closed-Loop Autonomous Driving with Large World Model [63.99937807085461]
We propose a large Driving wOrld modEl (Doe-1) for unified perception, prediction, and planning.<n>We use free-form texts for perception and generate future predictions directly in the RGB space with image tokens.<n>For planning, we employ a position-aware tokenizer to effectively encode action into discrete tokens.
arXiv Detail & Related papers (2024-12-12T18:59:59Z) - GPD-1: Generative Pre-training for Driving [77.06803277735132]
We propose a unified Generative Pre-training for Driving (GPD-1) model to accomplish all these tasks.<n>We represent each scene with ego, agent, and map tokens and formulate autonomous driving as a unified token generation problem.<n>Our GPD-1 successfully generalizes to various tasks without finetuning, including scene generation, traffic simulation, closed-loop simulation, map prediction, and motion planning.
arXiv Detail & Related papers (2024-12-11T18:59:51Z) - DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral
Planning States for Autonomous Driving [69.82743399946371]
DriveMLM is a framework that can perform close-loop autonomous driving in realistic simulators.
We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system.
This model can plug-and-play in existing AD systems such as Apollo for close-loop driving.
arXiv Detail & Related papers (2023-12-14T18:59:05Z) - LMDrive: Closed-Loop End-to-End Driving with Large Language Models [37.910449013471656]
Large language models (LLM) have shown impressive reasoning capabilities that approach "Artificial General Intelligence"
This paper introduces LMDrive, a novel language-guided, end-to-end, closed-loop autonomous driving framework.
arXiv Detail & Related papers (2023-12-12T18:24:15Z) - DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model [84.29836263441136]
This study introduces DriveGPT4, a novel interpretable end-to-end autonomous driving system based on multimodal large language models (MLLMs)
DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users.
arXiv Detail & Related papers (2023-10-02T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.