C-Learning: Learning to Achieve Goals via Recursive Classification
- URL: http://arxiv.org/abs/2011.08909v2
- Date: Mon, 19 Apr 2021 18:33:47 GMT
- Title: C-Learning: Learning to Achieve Goals via Recursive Classification
- Authors: Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
- Abstract summary: We study the problem of predicting and controlling the future state distribution of an autonomous agent.
Our work lays a principled foundation for goal-conditioned RL as density estimation.
- Score: 163.7610618571879
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of predicting and controlling the future state
distribution of an autonomous agent. This problem, which can be viewed as a
reframing of goal-conditioned reinforcement learning (RL), is centered around
learning a conditional probability density function over future states. Instead
of directly estimating this density function, we indirectly estimate this
density function by training a classifier to predict whether an observation
comes from the future. Via Bayes' rule, predictions from our classifier can be
transformed into predictions over future states. Importantly, an off-policy
variant of our algorithm allows us to predict the future state distribution of
a new policy, without collecting new experience. This variant allows us to
optimize functionals of a policy's future state distribution, such as the
density of reaching a particular goal state. While conceptually similar to
Q-learning, our work lays a principled foundation for goal-conditioned RL as
density estimation, providing justification for goal-conditioned methods used
in prior work. This foundation makes hypotheses about Q-learning, including the
optimal goal-sampling ratio, which we confirm experimentally. Moreover, our
proposed method is competitive with prior goal-conditioned RL methods.
Related papers
- Data Assimilation in Chaotic Systems Using Deep Reinforcement Learning [0.5999777817331317]
Data assimilation plays a pivotal role in diverse applications, ranging from climate predictions and weather forecasts to trajectory planning for autonomous vehicles.
Recent advancements have seen the emergence of deep learning approaches in this domain, primarily within a supervised learning framework.
In this study, we introduce a novel DA strategy that utilizes reinforcement learning (RL) to apply state corrections using full or partial observations of the state variables.
arXiv Detail & Related papers (2024-01-01T06:53:36Z) - Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation [3.1460691683829825]
Unsupervised source-free domain adaptation methods aim to train a model for the target domain utilizing a pretrained source-domain model and unlabeled target-domain data.
Traditional methods usually use self-training with pseudo-labeling, which is often subjected to thresholding based on prediction confidence.
We propose a novel approach by incorporating a mean-teacher model, wherein the student network is trained using all predictions from the teacher network.
arXiv Detail & Related papers (2022-12-08T12:20:35Z) - Meta Reinforcement Learning with Finite Training Tasks -- a Density Estimation Approach [21.44737454610142]
In meta reinforcement learning (meta RL), an agent learns from a set of training tasks how to quickly solve a new task, drawn from the same task distribution.
The question we explore in this work is how many training tasks are required to guarantee approximately optimal behavior with high probability.
We propose a different approach: directly learn the task distribution, using density estimation techniques, and then train a policy on the learned task distribution.
arXiv Detail & Related papers (2022-06-21T20:32:19Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Learning Calibrated Uncertainties for Domain Shift: A Distributionally
Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts.
In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution.
We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z) - Statistical Inference of the Value Function for Reinforcement Learning
in Infinite Horizon Settings [0.0]
We construct confidence intervals (CIs) for a policy's value in infinite horizon settings where the number of decision points diverges to infinity.
We show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique.
We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient's health status.
arXiv Detail & Related papers (2020-01-13T19:42:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.