Learning to Terminate in Object Navigation
- URL: http://arxiv.org/abs/2309.16164v1
- Date: Thu, 28 Sep 2023 04:32:08 GMT
- Title: Learning to Terminate in Object Navigation
- Authors: Yuhang Song and Anh Nguyen and Chun-Yi Lee
- Abstract summary: This paper tackles the critical challenge of object navigation in autonomous navigation systems.
We propose a novel approach, namely the Depth-Inference Termination Agent (DITA)
We train our judge model along with reinforcement learning in parallel and supervise the former efficiently by reward signal.
- Score: 16.164536630623644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the critical challenge of object navigation in autonomous
navigation systems, particularly focusing on the problem of target approach and
episode termination in environments with long optimal episode length in Deep
Reinforcement Learning (DRL) based methods. While effective in environment
exploration and object localization, conventional DRL methods often struggle
with optimal path planning and termination recognition due to a lack of depth
information. To overcome these limitations, we propose a novel approach, namely
the Depth-Inference Termination Agent (DITA), which incorporates a supervised
model called the Judge Model to implicitly infer object-wise depth and decide
termination jointly with reinforcement learning. We train our judge model along
with reinforcement learning in parallel and supervise the former efficiently by
reward signal. Our evaluation shows the method is demonstrating superior
performance, we achieve a 9.3% gain on success rate than our baseline method
across all room types and gain 51.2% improvements on long episodes environment
while maintaining slightly better Success Weighted by Path Length (SPL). Code
and resources, visualization are available at:
https://github.com/HuskyKingdom/DITA_acml2023
Related papers
- Preference-Guided Reinforcement Learning for Efficient Exploration [7.83845308102632]
We introduce LOPE: Learning Online with trajectory Preference guidancE, an end-to-end preference-guided RL framework.
Our intuition is that LOPE directly adjusts the focus of online exploration by considering human feedback as guidance.
LOPE outperforms several state-of-the-art methods regarding convergence rate and overall performance.
arXiv Detail & Related papers (2024-07-09T02:11:12Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - Adaptive trajectory-constrained exploration strategy for deep
reinforcement learning [6.589742080994319]
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces.
We propose an efficient adaptive trajectory-constrained exploration strategy for DRL.
We conduct experiments on two large 2D grid world mazes and several MuJoCo tasks.
arXiv Detail & Related papers (2023-12-27T07:57:15Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - CCE: Sample Efficient Sparse Reward Policy Learning for Robotic Navigation via Confidence-Controlled Exploration [72.24964965882783]
Confidence-Controlled Exploration (CCE) is designed to enhance the training sample efficiency of reinforcement learning algorithms for sparse reward settings such as robot navigation.
CCE is based on a novel relationship we provide between gradient estimation and policy entropy.
We demonstrate through simulated and real-world experiments that CCE outperforms conventional methods that employ constant trajectory lengths and entropy regularization.
arXiv Detail & Related papers (2023-06-09T18:45:15Z) - DL-DRL: A double-level deep reinforcement learning approach for
large-scale task scheduling of multi-UAV [65.07776277630228]
We propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF)
Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs.
We also exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks.
arXiv Detail & Related papers (2022-08-04T04:35:53Z) - Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation [78.17108227614928]
We propose a benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation.
We consider a value-based and policy-gradient Deep Reinforcement Learning (DRL)
We also propose a verification strategy that checks the behavior of the trained models over a set of desired properties.
arXiv Detail & Related papers (2021-12-16T16:53:56Z) - CLAMGen: Closed-Loop Arm Motion Generation via Multi-view Vision-Based
RL [4.014524824655106]
We propose a vision-based reinforcement learning (RL) approach for closed-loop trajectory generation in an arm reaching problem.
Arm trajectory generation is a fundamental robotics problem which entails finding collision-free paths to move the robot's body.
arXiv Detail & Related papers (2021-03-24T15:33:03Z) - POMP: Pomcp-based Online Motion Planning for active visual search in
indoor environments [89.43830036483901]
We focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup.
Our POMP method uses as input the current pose of an agent and a RGB-D frame.
We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1.
arXiv Detail & Related papers (2020-09-17T08:23:50Z) - Using Deep Reinforcement Learning Methods for Autonomous Vessels in 2D
Environments [11.657524999491029]
In this work, we used deep reinforcement learning combining Q-learning with a neural representation to avoid instability.
Our methodology uses deep q-learning and combines it with a rolling wave planning approach on agile methodology.
Experimental results show that the proposed method enhanced the performance of VVN by 55.31 on average for long-distance missions.
arXiv Detail & Related papers (2020-03-23T12:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.