Universal Value Density Estimation for Imitation Learning and
Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2002.06473v1
- Date: Sat, 15 Feb 2020 23:46:29 GMT
- Title: Universal Value Density Estimation for Imitation Learning and
Goal-Conditioned Reinforcement Learning
- Authors: Yannick Schroecker, Charles Isbell
- Abstract summary: In either case, effective solutions require the agent to reliably reach a specified state.
This work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state.
As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in domains.
As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.
- Score: 5.406386303264086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work considers two distinct settings: imitation learning and
goal-conditioned reinforcement learning. In either case, effective solutions
require the agent to reliably reach a specified state (a goal), or set of
states (a demonstration). Drawing a connection between probabilistic long-term
dynamics and the desired value function, this work introduces an approach which
utilizes recent advances in density estimation to effectively learn to reach a
given state. As our first contribution, we use this approach for
goal-conditioned reinforcement learning and show that it is both efficient and
does not suffer from hindsight bias in stochastic domains. As our second
contribution, we extend the approach to imitation learning and show that it
achieves state-of-the art demonstration sample-efficiency on standard benchmark
tasks.
Related papers
- Modeling of learning curves with applications to pos tagging [0.27624021966289597]
We introduce an algorithm to estimate the evolution of learning curves on the whole of a training data base.
We approximate iteratively the sought value at the desired time, independently of the learning technique used.
The proposal proves to be formally correct with respect to our working hypotheses and includes a reliable proximity condition.
arXiv Detail & Related papers (2024-02-04T15:00:52Z) - Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot.
By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance.
Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - Domain Adaptation with Adversarial Training on Penultimate Activations [82.9977759320565]
Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA)
We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features.
arXiv Detail & Related papers (2022-08-26T19:50:46Z) - Goal Recognition as Reinforcement Learning [20.651718821998106]
We develop a framework that combines model-free reinforcement learning and goal recognition.
This framework consists of two main stages: Offline learning of policies or utility functions for each potential goal, and online inference.
The resulting instantiation achieves state-of-the-art performance against goal recognizers on standard evaluation domains and superior performance in noisy environments.
arXiv Detail & Related papers (2022-02-13T16:16:43Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - Deterministic and Discriminative Imitation (D2-Imitation): Revisiting
Adversarial Imitation for Sample Efficiency [61.03922379081648]
We propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization.
Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation.
arXiv Detail & Related papers (2021-12-11T19:36:19Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z) - Physical Reasoning Using Dynamics-Aware Models [32.402950370430496]
This study aims to address the limitation by augmenting the reward value with additional supervisory signals about object dynamics.
Specifically,we define a distance measure between the trajectory of two target objects, and use this distance measure to characterize the similarity of two environment rollouts.
We train the model to correctly rank rollouts according to this measure in addition to predicting the correct reward.
arXiv Detail & Related papers (2021-02-20T12:56:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.