WHALES: A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving
- URL: http://arxiv.org/abs/2411.13340v2
- Date: Tue, 17 Jun 2025 03:01:23 GMT
- Title: WHALES: A Multi-agent Scheduling Dataset for Enhanced Cooperation in Autonomous Driving
- Authors: Richard Wang, Siwei Chen, Ziyi Song, Sheng Zhou,
- Abstract summary: We present WHALES, the first large-scale V2X dataset designed to benchmark communication-aware agent scheduling and scalable cooperative perception.<n>WHALES establishes a new state-of-the-art (SOTA) standard with an average of 8.4 cooperative agents per scene and 2.01 million annotated 3D objects spanning diverse traffic scenarios.<n>To further advance the field, we propose the Coverage-Aware Historical Scheduler (CAHS), a novel scheduling baseline that prioritizes agents based on historical viewpoint coverage.
- Score: 13.290191462007668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative perception research is constrained by the scarcity of datasets that capture the complexity of real-world Vehicle-to-Everything (V2X) interactions, particularly under dynamic communication constraints. To address this, we present WHALES (Wireless enhanced Autonomous vehicles with Large number of Engaged agents), the first large-scale V2X dataset specifically designed to benchmark communication-aware agent scheduling and scalable cooperative perception. WHALES establishes a new state-of-the-art (SOTA) standard with an average of 8.4 cooperative agents per scene and 2.01 million annotated 3D objects spanning diverse traffic scenarios. It integrates communication metadata to simulate real-world communication bottlenecks, enabling rigorous evaluation of scheduling strategies. To further advance the field, we propose the Coverage-Aware Historical Scheduler (CAHS), a novel scheduling baseline that prioritizes agents based on historical viewpoint coverage, improving perception performance over existing SOTA methods. WHALES bridges the gap between simulated and real-world V2X challenges, offering a robust framework to explore perception-scheduling co-design, cross-data generalization, and scalability limits. The WHALES dataset and code are available at: https://github.com/chensiweiTHU/WHALES.
Related papers
- World Model-Based Learning for Long-Term Age of Information Minimization in Vehicular Networks [53.98633183204453]
In this paper, a novel world model-based learning framework is proposed to minimize packet-completeness-aware age of information (CAoI) in a vehicular network.<n>A world model framework is proposed to jointly learn a dynamic model of the mmWave V2X environment and use it to imagine trajectories for learning how to perform link scheduling.<n>In particular, the long-term policy is learned in differentiable imagined trajectories instead of environment interactions.
arXiv Detail & Related papers (2025-05-03T06:23:18Z) - V2X-ReaLO: An Open Online Framework and Dataset for Cooperative Perception in Reality [13.68645389910716]
We introduce V2X-ReaLO, an open online cooperative perception framework deployed on real vehicles and smart infrastructure.<n>We present an open benchmark dataset designed to assess the performance of online cooperative perception systems.
arXiv Detail & Related papers (2025-03-13T04:31:20Z) - V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models [31.537045261401666]
We propose a novel problem setting that integrates a Multi-Modal Large Language Model into cooperative autonomous driving.
We also propose our baseline method Vehicle-to-Vehicle Multi-Modal Large Language Model (V2V-LLM)
Experimental results show that our proposed V2V-LLM can be a promising unified model architecture for performing various tasks in cooperative autonomous driving.
arXiv Detail & Related papers (2025-02-14T08:05:41Z) - Deep Reinforcement Learning-Based User Scheduling for Collaborative Perception [24.300126250046894]
Collaborative perception is envisioned to improve perceptual accuracy by using vehicle-to-everything (V2X) communication.<n>Due to limited communication resources, it is impractical for all units to transmit sensing data such as point clouds or high-definition video.<n>We propose a deep reinforcement learning-based V2X user scheduling algorithm for collaborative perception.
arXiv Detail & Related papers (2025-02-12T04:45:00Z) - V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction [44.40410127660706]
Vehicle-to-everything (V2X) technologies offer the limitations of constrained observability in single-vehicle systems.<n>We focus on one-step and multi-step communication strategies (when to transmit) as well as examine their integration with three fusion strategies.<n>Our framework outperforms state-of-the-art methods in both perception and prediction tasks.
arXiv Detail & Related papers (2024-12-02T18:55:34Z) - Self-Supervised State Space Model for Real-Time Traffic Accident Prediction Using eKAN Networks [18.385759762991896]
SSL-eKamba is an efficient self-supervised framework for traffic accident prediction.
To enhance generalization, we design two self-supervised auxiliary tasks that adaptively improve traffic pattern representation.
Experiments on two real-world datasets demonstrate that SSL-eKamba consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-09-09T14:25:51Z) - Scaling Large Language Model-based Multi-Agent Collaboration [72.8998796426346]
Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning.
This study explores whether the continuous addition of collaborative agents can yield similar benefits.
arXiv Detail & Related papers (2024-06-11T11:02:04Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - AutoAgents: A Framework for Automatic Agent Generation [27.74332323317923]
AutoAgents is an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks.
Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods.
arXiv Detail & Related papers (2023-09-29T14:46:30Z) - End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others.
We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z) - Visual Exemplar Driven Task-Prompting for Unified Perception in
Autonomous Driving [100.3848723827869]
We present an effective multi-task framework, VE-Prompt, which introduces visual exemplars via task-specific prompting.
Specifically, we generate visual exemplars based on bounding boxes and color-based markers, which provide accurate visual appearances of target categories.
We bridge transformer-based encoders and convolutional layers for efficient and accurate unified perception in autonomous driving.
arXiv Detail & Related papers (2023-03-03T08:54:06Z) - DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and
Interconnected Self-driving [19.66714697653504]
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving.
The lack of datasets has severely blocked the development of collaborative perception algorithms.
We release DOLPHINS: dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving.
arXiv Detail & Related papers (2022-07-15T17:07:07Z) - COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked
Vehicles [54.61668577827041]
We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving.
Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate.
arXiv Detail & Related papers (2022-05-04T17:55:12Z) - Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping
and Multi-Agent [2.512827436728378]
We propose a novel deep learning model trained with end-to-end and multi-task learning manners to perform both perception and control tasks simultaneously.
The model is evaluated on CARLA simulator with various scenarios made of normal-adversarial situations and different weathers to mimic real-world conditions.
arXiv Detail & Related papers (2022-04-12T03:57:01Z) - V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision
Transformer [58.71845618090022]
We build a holistic attention model, namely V2X-ViT, to fuse information across on-road agents.
V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention.
To validate our approach, we create a large-scale V2X perception dataset.
arXiv Detail & Related papers (2022-03-20T20:18:25Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Anomaly Detection in Multi-Agent Trajectories for Automated Driving [2.5211566369910967]
Similar to humans, automated vehicles are supposed to perform anomaly detection.
Our innovation is the ability to jointly learn multiple trajectories of a dynamic number of agents.
arXiv Detail & Related papers (2021-10-15T08:07:31Z) - Cross-modal Consensus Network for Weakly Supervised Temporal Action
Localization [74.34699679568818]
Weakly supervised temporal action localization (WS-TAL) is a challenging task that aims to localize action instances in the given video with video-level categorical supervision.
We propose a cross-modal consensus network (CO2-Net) to tackle this problem.
arXiv Detail & Related papers (2021-07-27T04:21:01Z) - Value Function is All You Need: A Unified Learning Framework for Ride
Hailing Platforms [57.21078336887961]
Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day.
We propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks.
arXiv Detail & Related papers (2021-05-18T19:22:24Z) - SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for
Autonomous Driving [96.50297622371457]
Multi-agent interaction is a fundamental aspect of autonomous driving in the real world.
Despite more than a decade of research and development, the problem of how to interact with diverse road users in diverse scenarios remains largely unsolved.
We develop a dedicated simulation platform called SMARTS that generates diverse and competent driving interactions.
arXiv Detail & Related papers (2020-10-19T18:26:10Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.