Related papers: CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking

CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking

URL: http://arxiv.org/abs/2507.11334v1
Date: Tue, 15 Jul 2025 14:06:24 GMT
Title: CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking
Authors: Yuehao Huang, Liang Liu, Shuangming Lei, Yukai Ma, Hao Su, Jianbiao Mei, Pengxiang Zhao, Yaqing Gu, Yong Liu, Jiajun Lv,
Abstract summary: We propose CogDDN, a VLM-based framework that emulates the human cognitive and learning mechanisms.<n>CogDDN identifies appropriate target objects by semantically aligning detected objects with the given instructions.<n>It incorporates a dual-process decision-making module, comprising a Heuristic Process for rapid, efficient decisions and an Analytic Process that analyzes past errors.
Score: 22.817457688303513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile robots are increasingly required to navigate and interact within unknown and unstructured environments to meet human demands. Demand-driven navigation (DDN) enables robots to identify and locate objects based on implicit human intent, even when object locations are unknown. However, traditional data-driven DDN methods rely on pre-collected data for model training and decision-making, limiting their generalization capability in unseen scenarios. In this paper, we propose CogDDN, a VLM-based framework that emulates the human cognitive and learning mechanisms by integrating fast and slow thinking systems and selectively identifying key objects essential to fulfilling user demands. CogDDN identifies appropriate target objects by semantically aligning detected objects with the given instructions. Furthermore, it incorporates a dual-process decision-making module, comprising a Heuristic Process for rapid, efficient decisions and an Analytic Process that analyzes past errors, accumulates them in a knowledge base, and continuously improves performance. Chain of Thought (CoT) reasoning strengthens the decision-making process. Extensive closed-loop evaluations on the AI2Thor simulator with the ProcThor dataset show that CogDDN outperforms single-view camera-only methods by 15%, demonstrating significant improvements in navigation accuracy and adaptability. The project page is available at https://yuehaohuang.github.io/CogDDN/.

Related papers

CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs [33.123447047397484]
Object goal navigation (ObjectNav) is a fundamental task in embodied AI, requiring an agent to locate a target object in previously unseen environments.<n>We propose CogNav, a framework designed to mimic the cognitive process using large language models.<n>CogNav significantly improves the success rate of ObjectNav at least by relative 14% over the state-of-the-arts.
arXiv Detail & Related papers (2024-12-11T09:50:35Z)
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance.<n>We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA)<n>First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description.<n>An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z)
Run-time Introspection of 2D Object Detection in Automated Driving Systems Using Learning Representations [13.529124221397822]
We introduce a novel introspection solution for 2D object detection based on Deep Neural Networks (DNNs) We implement several state-of-the-art (SOTA) introspection mechanisms for error detection in 2D object detection, using one-stage and two-stage object detectors evaluated on KITTI and BDD datasets. Our performance evaluation shows that the proposed introspection solution outperforms SOTA methods, achieving an absolute reduction in the missed error ratio of 9% to 17% in the BDD dataset.
arXiv Detail & Related papers (2024-03-02T10:56:14Z)
Extracting Process-Aware Decision Models from Object-Centric Process Data [54.04724730771216]
This paper proposes the first object-centric decision-mining algorithm called Integrated Object-centric Decision Discovery Algorithm (IODDA) IODDA is able to discover how a decision is structured as well as how a decision is made.
arXiv Detail & Related papers (2024-01-26T13:27:35Z)
Efficient Adaptive Human-Object Interaction Detection with Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM) ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm. Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z)
Performance Study of YOLOv5 and Faster R-CNN for Autonomous Navigation around Non-Cooperative Targets [0.0]
This paper discusses how the combination of cameras and machine learning algorithms can achieve the relative navigation task. The performance of two deep learning-based object detection algorithms, Faster Region-based Convolutional Neural Networks (R-CNN) and You Only Look Once (YOLOv5) is tested. The paper discusses the path to implementing the feature recognition algorithms and towards integrating them into the spacecraft Guidance Navigation and Control system.
arXiv Detail & Related papers (2023-01-22T04:53:38Z)
Task-Oriented Sensing, Computation, and Communication Integration for Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC) We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z)
Centralizing State-Values in Dueling Networks for Multi-Robot Reinforcement Learning Mapless Navigation [87.85646257351212]
We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm. This problem is challenging when each robot considers its path without explicitly sharing observations with other robots. We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
arXiv Detail & Related papers (2021-12-16T16:47:00Z)
Decoupled and Memory-Reinforced Networks: Towards Effective Feature Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network. There are two major challenges in the current one-step approaches. We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z)
Reducing DNN Labelling Cost using Surprise Adequacy: An Industrial Case Study for Autonomous Driving [23.054842564447895]
Deep Neural Networks (DNNs) are rapidly being adopted by the automotive industry, due to their impressive performance in tasks that are essential for autonomous driving. This paper shows how development of a DNN based object segmentation can be improved by exploiting the correlation between Surprise Adequacy (SA) and model performance. In our industrial case study the technique allows cost savings of up to 50% with negligible evaluation inaccuracy.
arXiv Detail & Related papers (2020-05-29T06:33:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.