Learning Navigation Costs from Demonstration in Partially Observable
Environments
- URL: http://arxiv.org/abs/2002.11637v1
- Date: Wed, 26 Feb 2020 17:15:10 GMT
- Title: Learning Navigation Costs from Demonstration in Partially Observable
Environments
- Authors: Tianyu Wang, Vikas Dhiman, Nikolay Atanasov
- Abstract summary: This paper focuses on inverse reinforcement learning (IRL) to enable safe and efficient autonomous navigation in unknown partially observable environments.
We develop a cost function representation composed of two parts: a probabilistic occupancy encoder, with recurrent dependence on the observation sequence, and a cost encoder, defined over the occupancy features.
Our model exceeds the accuracy of baseline IRL algorithms in robot navigation tasks, while substantially improving the efficiency of training and test-time inference.
- Score: 24.457042947946025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper focuses on inverse reinforcement learning (IRL) to enable safe and
efficient autonomous navigation in unknown partially observable environments.
The objective is to infer a cost function that explains expert-demonstrated
navigation behavior while relying only on the observations and state-control
trajectory used by the expert. We develop a cost function representation
composed of two parts: a probabilistic occupancy encoder, with recurrent
dependence on the observation sequence, and a cost encoder, defined over the
occupancy features. The representation parameters are optimized by
differentiating the error between demonstrated controls and a control policy
computed from the cost encoder. Such differentiation is typically computed by
dynamic programming through the value function over the whole state space. We
observe that this is inefficient in large partially observable environments
because most states are unexplored. Instead, we rely on a closed-form
subgradient of the cost-to-go obtained only over a subset of promising states
via an efficient motion-planning algorithm such as A* or RRT. Our experiments
show that our model exceeds the accuracy of baseline IRL algorithms in robot
navigation tasks, while substantially improving the efficiency of training and
test-time inference.
Related papers
- Cost-Aware Query Policies in Active Learning for Efficient Autonomous Robotic Exploration [0.0]
This paper analyzes an AL algorithm for Gaussian Process regression while incorporating action cost.
Traditional uncertainty metric with a distance constraint best minimizes root-mean-square error over trajectory distance.
arXiv Detail & Related papers (2024-10-31T18:35:03Z) - Multistep Inverse Is Not All You Need [87.62730694973696]
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise.
It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables.
We propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model.
arXiv Detail & Related papers (2024-03-18T16:36:01Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Can Direct Latent Model Learning Solve Linear Quadratic Gaussian
Control? [75.14973944905216]
We study the task of learning state representations from potentially high-dimensional observations.
We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning.
arXiv Detail & Related papers (2022-12-30T01:42:04Z) - TransPath: Learning Heuristics For Grid-Based Pathfinding via
Transformers [64.88759709443819]
We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search.
The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one.
The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
arXiv Detail & Related papers (2022-12-22T14:26:11Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations [50.37808220291108]
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations.
We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety.
We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior.
arXiv Detail & Related papers (2021-11-18T23:21:00Z) - Goal-Directed Planning by Reinforcement Learning and Active Inference [16.694117274961016]
We propose a novel computational framework of decision making with Bayesian inference.
Goal-directed behavior is determined from the posterior distribution of $z$ by planning.
We demonstrate the effectiveness of the proposed framework by experiments in a sensorimotor navigation task with camera observations and continuous motor actions.
arXiv Detail & Related papers (2021-06-18T06:41:01Z) - Inverse reinforcement learning for autonomous navigation via
differentiable semantic mapping and planning [20.66819092398541]
This paper focuses on inverse reinforcement learning for autonomous navigation using distance and semantic category observations.
We develop a map encoder, that infers semantic category probabilities from the observation sequence, and a cost encoder, defined as a deep neural network over the semantic features.
We show that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of buildings, sidewalks, and road lanes.
arXiv Detail & Related papers (2021-01-01T07:41:08Z) - Learning Navigation Costs from Demonstration with Semantic Observations [24.457042947946025]
This paper focuses on inverse reinforcement learning (IRL) for autonomous robot navigation using semantic observations.
We develop a map encoder, which infers semantic class probabilities from the observation sequence, and a cost encoder, defined as deep neural network over the semantic features.
We show that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of cars, sidewalks and road lanes.
arXiv Detail & Related papers (2020-06-09T04:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.