Off-Policy Evaluation with Online Adaptation for Robot Exploration in
Challenging Environments
- URL: http://arxiv.org/abs/2204.03140v3
- Date: Wed, 24 May 2023 20:18:32 GMT
- Title: Off-Policy Evaluation with Online Adaptation for Robot Exploration in
Challenging Environments
- Authors: Yafei Hu, Junyi Geng, Chen Wang, John Keller, and Sebastian Scherer
- Abstract summary: This paper presents a method to learn how "good" states are, measured by the state value function, to provide a guidance for robot exploration.
It consists of offline Monte-Carlo training on real-world data and performs Temporal Difference (TD) online adaptation to optimize the trained value estimator.
Results show that our method enables the robot to predict the value of future states so as to better guide robot exploration.
- Score: 6.4617907823964345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous exploration has many important applications. However, classic
information gain-based or frontier-based exploration only relies on the robot
current state to determine the immediate exploration goal, which lacks the
capability of predicting the value of future states and thus leads to
inefficient exploration decisions. This paper presents a method to learn how
"good" states are, measured by the state value function, to provide a guidance
for robot exploration in real-world challenging environments. We formulate our
work as an off-policy evaluation (OPE) problem for robot exploration (OPERE).
It consists of offline Monte-Carlo training on real-world data and performs
Temporal Difference (TD) online adaptation to optimize the trained value
estimator. We also design an intrinsic reward function based on sensor
information coverage to enable the robot to gain more information with sparse
extrinsic rewards. Results show that our method enables the robot to predict
the value of future states so as to better guide robot exploration. The
proposed algorithm achieves better prediction and exploration performance
compared with the state-of-the-arts. To the best of our knowledge, this work
for the first time demonstrates value function prediction on real-world dataset
for robot exploration in challenging subterranean and urban environments. More
details and demo videos can be found at https://jeffreyyh.github.io/opere/.
Related papers
- Explore until Confident: Efficient Exploration for Embodied Question Answering [32.27111287314288]
We leverage the strong semantic reasoning capabilities of large vision-language models to efficiently explore and answer questions.
We propose a method that first builds a semantic map of the scene based on depth information and via visual prompting of a VLM.
Next, we use conformal prediction to calibrate the VLM's question answering confidence, allowing the robot to know when to stop exploration.
arXiv Detail & Related papers (2024-03-23T22:04:03Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Incremental 3D Scene Completion for Safe and Efficient Exploration
Mapping and Planning [60.599223456298915]
We propose a novel way to integrate deep learning into exploration by leveraging 3D scene completion for informed, safe, and interpretable mapping and planning.
We show that our method can speed up coverage of an environment by 73% compared to the baselines with only minimal reduction in map accuracy.
Even if scene completions are not included in the final map, we show that they can be used to guide the robot to choose more informative paths, speeding up the measurement of the scene with the robot's sensors by 35%.
arXiv Detail & Related papers (2022-08-17T14:19:33Z) - Domain and Modality Gaps for LiDAR-based Person Detection on Mobile
Robots [91.01747068273666]
This paper studies existing LiDAR-based person detectors with a particular focus on mobile robot scenarios.
Experiments revolve around the domain gap between driving and mobile robot scenarios, as well as the modality gap between 3D and 2D LiDAR sensors.
Results provide practical insights into LiDAR-based person detection and facilitate informed decisions for relevant mobile robot designs and applications.
arXiv Detail & Related papers (2021-06-21T16:35:49Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - Low Dimensional State Representation Learning with Reward-shaped Priors [7.211095654886105]
We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space.
This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task.
We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot.
arXiv Detail & Related papers (2020-07-29T13:00:39Z) - Autonomous Exploration Under Uncertainty via Deep Reinforcement Learning
on Graphs [5.043563227694137]
We consider an autonomous exploration problem in which a range-sensing mobile robot is tasked with accurately mapping the landmarks in an a priori unknown environment efficiently in real-time.
We propose a novel approach that uses graph neural networks (GNNs) in conjunction with deep reinforcement learning (DRL), enabling decision-making over graphs containing exploration information to predict a robot's optimal sensing action in belief space.
arXiv Detail & Related papers (2020-07-24T16:50:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.