Symphony of experts: orchestration with adversarial insights in
reinforcement learning
- URL: http://arxiv.org/abs/2310.16473v1
- Date: Wed, 25 Oct 2023 08:53:51 GMT
- Title: Symphony of experts: orchestration with adversarial insights in
reinforcement learning
- Authors: Matthieu Jonckheere (LAAS), Chiara Mignacco (LMO, CELESTE), Gilles
Stoltz (LMO, CELESTE)
- Abstract summary: We explore the concept of orchestration, where a set of expert policies guides decision-making.
We extend the analysis of natural policy gradient to arbitrary adversarial aggregation strategies.
A key point of our approach lies in its arguably more transparent proofs compared to existing methods.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured reinforcement learning leverages policies with advantageous
properties to reach better performance, particularly in scenarios where
exploration poses challenges. We explore this field through the concept of
orchestration, where a (small) set of expert policies guides decision-making;
the modeling thereof constitutes our first contribution. We then establish
value-functions regret bounds for orchestration in the tabular setting by
transferring regret-bound results from adversarial settings. We generalize and
extend the analysis of natural policy gradient in Agarwal et al. [2021, Section
5.3] to arbitrary adversarial aggregation strategies. We also extend it to the
case of estimated advantage functions, providing insights into sample
complexity both in expectation and high probability. A key point of our
approach lies in its arguably more transparent proofs compared to existing
methods. Finally, we present simulations for a stochastic matching toy model.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Consciousness-Inspired Spatio-Temporal Abstractions for Better Generalization in Reinforcement Learning [83.41487567765871]
Skipper is a model-based reinforcement learning framework.
It automatically generalizes the task given into smaller, more manageable subtasks.
It enables sparse decision-making and focused abstractions on the relevant parts of the environment.
arXiv Detail & Related papers (2023-09-30T02:25:18Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - Performative Reinforcement Learning [8.07595093287034]
We introduce the concept of performatively stable policy.
We show that repeatedly optimizing this objective converges to a performatively stable policy.
arXiv Detail & Related papers (2022-06-30T18:26:03Z) - On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z) - Explaining, Evaluating and Enhancing Neural Networks' Learned
Representations [2.1485350418225244]
We show how explainability can be an aid, rather than an obstacle, towards better and more efficient representations.
We employ such attributions to define two novel scores for evaluating the informativeness and the disentanglement of latent embeddings.
We show that adopting our proposed scores as constraints during the training of a representation learning task improves the downstream performance of the model.
arXiv Detail & Related papers (2022-02-18T19:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.