Related papers: Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement

Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement

URL: http://arxiv.org/abs/2310.19261v1
Date: Mon, 30 Oct 2023 04:12:19 GMT
Title: Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement
Authors: Daesol Cho, Seungjae Lee, and H. Jin Kim
Abstract summary: Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge. This work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C) Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment.
Score: 30.21954044028645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples.

Related papers

Modeling Ranking Properties with In-Context Learning [13.34397013426643]
We propose an in-context learning (ICL) approach that eliminates the need for task-specific training for each ranking scenario and dataset.<n>Our method relies on a small number of example rankings that demonstrate the desired trade-offs between objectives for past queries similar to the current input.
arXiv Detail & Related papers (2025-05-23T10:58:22Z)
A Two-Stage Learning-to-Defer Approach for Multi-Task Learning [3.4289478404209826]
We introduce a novel Two-Stage L2D framework for multi-task learning that integrates classification and regression through a unified deferral mechanism.<n>Our method leverages a two-stage surrogate loss family, which we prove to be both Bayes-consistent and $(mathcalG, mathcalR)$-consistent.<n> Experiments on object detection and electronic health record analysis demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-21T07:44:57Z)
Counterfactual Explanation via Search in Gaussian Mixture Distributed Latent Space [19.312306559210125]
Counterfactual Explanations (CEs) are an important tool in Algorithmic Recourse for addressing two questions. guiding the user's interaction with AI systems by proposing easy-to-understand explanations is essential for the trustworthy adoption and long-term acceptance of AI systems. We introduce a new method to generate CEs for a pre-trained binary classifier by first shaping the latent space of an autoencoder to be a mixture of Gaussian distributions.
arXiv Detail & Related papers (2023-07-25T10:21:26Z)
Off-policy Evaluation in Doubly Inhomogeneous Environments [26.944002214665385]
We develop a general OPE framework that consists of both model-based and model-free approaches. This is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities.
arXiv Detail & Related papers (2023-06-14T19:48:30Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning [20.680417111485305]
We introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation. The improved distributional estimates lend themselves to UCB-based exploration. We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control.
arXiv Detail & Related papers (2022-02-06T03:27:05Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
Process discovery on deviant traces and other stranger things [6.974048370610024]
We focus on declarative processes and embrace the less-popular view of process discovery as a binary supervised learning task. We deepen how the valuable information brought by both these two sets can be extracted and formalised into a model that is "optimal" according to user-defined goals.
arXiv Detail & Related papers (2021-09-30T06:58:34Z)
Concurrent Discrimination and Alignment for Self-Supervised Feature Learning [52.213140525321165]
Existing self-supervised learning methods learn by means of pretext tasks which are either (1) discriminating that explicitly specify which features should be separated or (2) aligning that precisely indicate which features should be closed together. In this work, we combine the positive aspects of the discriminating and aligning methods, and design a hybrid method that addresses the above issue. Our method explicitly specifies the repulsion and attraction mechanism respectively by discriminative predictive task and concurrently maximizing mutual information between paired views. Our experiments on nine established benchmarks show that the proposed model consistently outperforms the existing state-of-the-art results of self-supervised and transfer
arXiv Detail & Related papers (2021-08-19T09:07:41Z)
Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment. Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z)
Learning Invariant Representations and Risks for Semi-supervised Domain Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA) We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z)
Universal Source-Free Domain Adaptation [57.37520645827318]
We propose a novel two-stage learning process for domain adaptation. In the Procurement stage, we aim to equip the model for future source-free deployment, assuming no prior knowledge of the upcoming category-gap and domain-shift. In the Deployment stage, the goal is to design a unified adaptation algorithm capable of operating across a wide range of category-gaps.
arXiv Detail & Related papers (2020-04-09T07:26:20Z)
Contradictory Structure Learning for Semi-supervised Domain Adaptation [67.89665267469053]
Current adversarial adaptation methods attempt to align the cross-domain features. Two challenges remain unsolved: 1) the conditional distribution mismatch and 2) the bias of the decision boundary towards the source domain. We propose a novel framework for semi-supervised domain adaptation by unifying the learning of opposite structures.
arXiv Detail & Related papers (2020-02-06T22:58:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.