Diversify & Conquer: Outcome-directed Curriculum RL via
Out-of-Distribution Disagreement
- URL: http://arxiv.org/abs/2310.19261v1
- Date: Mon, 30 Oct 2023 04:12:19 GMT
- Title: Diversify & Conquer: Outcome-directed Curriculum RL via
Out-of-Distribution Disagreement
- Authors: Daesol Cho, Seungjae Lee, and H. Jin Kim
- Abstract summary: Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge.
This work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C)
Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment.
- Score: 30.21954044028645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) often faces the challenges of uninformed search
problems where the agent should explore without access to the domain knowledge
such as characteristics of the environment or external rewards. To tackle these
challenges, this work proposes a new approach for curriculum RL called
Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning
methods, D2C requires only a few examples of desired outcomes and works in any
environment, regardless of its geometry or the distribution of the desired
outcome examples. The proposed method performs diversification of the
goal-conditional classifiers to identify similarities between visited and
desired outcome states and ensures that the classifiers disagree on states from
out-of-distribution, which enables quantifying the unexplored region and
designing an arbitrary goal-conditioned intrinsic reward signal in a simple and
intuitive way. The proposed method then employs bipartite matching to define a
curriculum learning objective that produces a sequence of well-adjusted
intermediate goals, which enable the agent to automatically explore and conquer
the unexplored region. We present experimental results demonstrating that D2C
outperforms prior curriculum RL methods in both quantitative and qualitative
aspects, even with the arbitrarily distributed desired outcome examples.
Related papers
- Counterfactual Explanation via Search in Gaussian Mixture Distributed
Latent Space [19.312306559210125]
Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions.
guiding the user's interaction with AI systems by proposing easy-to-understand explanations is essential for the trustworthy adoption and long-term acceptance of AI systems.
We introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions.
arXiv Detail & Related papers (2023-07-25T10:21:26Z) - Off-policy Evaluation in Doubly Inhomogeneous Environments [29.434386775600498]
We develop a general OPE framework that consists of both model-based and model-free approaches.
This is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities.
arXiv Detail & Related papers (2023-06-14T19:48:30Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Exploration with Multi-Sample Target Values for Distributional
Reinforcement Learning [20.680417111485305]
We introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation.
The improved distributional estimates lend themselves to UCB-based exploration.
We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control.
arXiv Detail & Related papers (2022-02-06T03:27:05Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Process discovery on deviant traces and other stranger things [6.974048370610024]
We focus on declarative processes and embrace the less-popular view of process discovery as a binary supervised learning task.
We deepen how the valuable information brought by both these two sets can be extracted and formalised into a model that is "optimal" according to user-defined goals.
arXiv Detail & Related papers (2021-09-30T06:58:34Z) - Concurrent Discrimination and Alignment for Self-Supervised Feature
Learning [52.213140525321165]
Existing self-supervised learning methods learn by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together.
In this work, we combine the positive aspects of the discriminating and aligning methods, and design a hybrid method that addresses the above issue.
Our method explicitly specifies the repulsion and attraction mechanism respectively by discriminative predictive task and concurrently maximizing mutual information between paired views.
Our experiments on nine established benchmarks show that the proposed model consistently outperforms the existing state-of-the-art results of self-supervised and transfer
arXiv Detail & Related papers (2021-08-19T09:07:41Z) - Variational Empowerment as Representation Learning for Goal-Based
Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment.
Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z) - Universal Source-Free Domain Adaptation [57.37520645827318]
We propose a novel two-stage learning process for domain adaptation.
In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift.
In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps.
arXiv Detail & Related papers (2020-04-09T07:26:20Z) - Contradictory Structure Learning for Semi-supervised Domain Adaptation [67.89665267469053]
Current adversarial adaptation methods attempt to align the cross-domain features.
Two challenges remain unsolved: 1) the conditional distribution mismatch and 2) the bias of the decision boundary towards the source domain.
We propose a novel framework for semi-supervised domain adaptation by unifying the learning of opposite structures.
arXiv Detail & Related papers (2020-02-06T22:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.