Related papers: TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making

TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making

URL: http://arxiv.org/abs/2511.17225v1
Date: Fri, 21 Nov 2025 13:12:13 GMT
Title: TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making
Authors: Shanshan Li, Da Huang, Yu He, Yanwei Fu, Yu-Gang Jiang, Xiangyang Xue,
Abstract summary: Task-Preferenced Multi-Demand-Driven Navigation (TP-MDDN) is a new benchmark for long-horizon navigation involving multiple sub-demands with explicit task preferences.<n>For spatial memory, we design MASMap, which combines 3D point cloud accumulation with 2D semantic mapping for accurate and efficient environmental understanding.<n>Our approach outperforms state-of-the-art baselines in both perception accuracy and navigation robustness.
Score: 90.18833928208333
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In daily life, people often move through spaces to find objects that meet their needs, posing a key challenge in embodied AI. Traditional Demand-Driven Navigation (DDN) handles one need at a time but does not reflect the complexity of real-world tasks involving multiple needs and personal choices. To bridge this gap, we introduce Task-Preferenced Multi-Demand-Driven Navigation (TP-MDDN), a new benchmark for long-horizon navigation involving multiple sub-demands with explicit task preferences. To solve TP-MDDN, we propose AWMSystem, an autonomous decision-making system composed of three key modules: BreakLLM (instruction decomposition), LocateLLM (goal selection), and StatusMLLM (task monitoring). For spatial memory, we design MASMap, which combines 3D point cloud accumulation with 2D semantic mapping for accurate and efficient environmental understanding. Our Dual-Tempo action generation framework integrates zero-shot planning with policy-based fine control, and is further supported by an Adaptive Error Corrector that handles failure cases in real time. Experiments demonstrate that our approach outperforms state-of-the-art baselines in both perception accuracy and navigation robustness.

Related papers

UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception [71.19234323863314]
We propose a framework to simultaneously recognize driver behavior, driver emotion, vehicle behavior, and traffic context.<n>Our framework incorporates two core components: dual-branch spatial channel multimodal embedding and adaptive feature-decoupled multi-task loss.<n>We evaluate our method on the AIDE dataset, and the experimental results demonstrate that UV-M3TL achieves state-of-the-art performance across all four tasks.
arXiv Detail & Related papers (2026-02-02T03:35:24Z)
TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving [22.22943635900334]
TEM3-Learning is a novel framework that jointly optimize driver emotion recognition, driver behavior recognition, traffic context recognition, and vehicle behavior recognition.<n>It achieves state-of-the-art accuracy across all four tasks, maintaining a lightweight architecture with fewer than 6 million parameters and delivering an impressive 142.32 FPS inference speed.
arXiv Detail & Related papers (2025-06-22T16:12:27Z)
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving [48.17490295484055]
M3Net is a novel network that simultaneously tackles detection, segmentation, and 3D occupancy prediction for autonomous driving.<n>M3Net achieves state-of-the-art multi-task learning performance on the nuScenes benchmarks.
arXiv Detail & Related papers (2025-03-23T15:08:09Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [79.58755811919366]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing.<n>It is designed to accurately detect horizontal or oriented objects from any sensor modality.<n>This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height [2.975860548186652]
Occupancy and 3D object detection are two standard tasks in modern autonomous driving system. We propose a method to achieve fast 3D object detection and occupancy prediction (UltimateDO)
arXiv Detail & Related papers (2024-09-17T13:14:13Z)
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks. RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model. Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z)
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. Current MTL regimes have to activate nearly the entire model even to just execute a single task. We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.