Sample Efficient Social Navigation Using Inverse Reinforcement Learning
- URL: http://arxiv.org/abs/2106.10318v1
- Date: Fri, 18 Jun 2021 19:07:41 GMT
- Title: Sample Efficient Social Navigation Using Inverse Reinforcement Learning
- Authors: Bobak H. Baghi, Gregory Dudek
- Abstract summary: We describe an inverse reinforcement learning based algorithm which learns from human trajectory observations without knowing their specific actions.
We show that our approach yields better performance while also decreasing training time and sample complexity.
- Score: 11.764601181046498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present an algorithm to efficiently learn
socially-compliant navigation policies from observations of human trajectories.
As mobile robots come to inhabit and traffic social spaces, they must account
for social cues and behave in a socially compliant manner. We focus on learning
such cues from examples. We describe an inverse reinforcement learning based
algorithm which learns from human trajectory observations without knowing their
specific actions. We increase the sample-efficiency of our approach over
alternative methods by leveraging the notion of a replay buffer (found in many
off-policy reinforcement learning methods) to eliminate the additional sample
complexity associated with inverse reinforcement learning. We evaluate our
method by training agents using publicly available pedestrian motion data sets
and compare it to related methods. We show that our approach yields better
performance while also decreasing training time and sample complexity.
Related papers
- Online Context Learning for Socially-compliant Navigation [49.609656402450746]
This letter introduces an online context learning method that aims to empower robots to adapt to new social environments online.
Experiments using a community-wide simulator show that our method outperforms the state-of-the-art ones.
arXiv Detail & Related papers (2024-06-17T12:59:13Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Boosting Feedback Efficiency of Interactive Reinforcement Learning by
Adaptive Learning from Scores [11.702616722462139]
This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning.
We show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods.
arXiv Detail & Related papers (2023-07-11T16:12:15Z) - Offline Robot Reinforcement Learning with Uncertainty-Guided Human
Expert Sampling [11.751910133386254]
Recent advances in batch (offline) reinforcement learning have shown promising results in learning from available offline data.
We propose a novel approach that uses uncertainty estimation to trigger the injection of human demonstration data.
Our experiments show that this approach is more sample efficient when compared to a naive way of combining expert data with data collected from a sub-optimal agent.
arXiv Detail & Related papers (2022-12-16T01:41:59Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.