Bayesian Experience Reuse for Learning from Multiple Demonstrators
- URL: http://arxiv.org/abs/2006.05725v1
- Date: Wed, 10 Jun 2020 08:32:39 GMT
- Title: Bayesian Experience Reuse for Learning from Multiple Demonstrators
- Authors: Michael Gimelfarb, Scott Sanner, Chi-Guhn Lee
- Abstract summary: Learning from demonstrations (LfD) improves the exploration efficiency of a learning agent by incorporating demonstrations from experts.
We address this problem by modelling the uncertainty in source and target task functions using normal-inverse-gamma priors.
We use this learned belief to derive a quadratic programming problem whose solution yields a probability distribution over the expert models.
- Score: 24.489002406693128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from demonstrations (LfD) improves the exploration efficiency of a
learning agent by incorporating demonstrations from experts. However,
demonstration data can often come from multiple experts with conflicting goals,
making it difficult to incorporate safely and effectively in online settings.
We address this problem in the static and dynamic optimization settings by
modelling the uncertainty in source and target task functions using
normal-inverse-gamma priors, whose corresponding posteriors are, respectively,
learned from demonstrations and target data using Bayesian neural networks with
shared features. We use this learned belief to derive a quadratic programming
problem whose solution yields a probability distribution over the expert
models. Finally, we propose Bayesian Experience Reuse (BERS) to sample
demonstrations in accordance with this distribution and reuse them directly in
new tasks. We demonstrate the effectiveness of this approach for static
optimization of smooth functions, and transfer learning in a high-dimensional
supply chain problem with cost uncertainty.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - The Common Stability Mechanism behind most Self-Supervised Learning
Approaches [64.40701218561921]
We provide a framework to explain the stability mechanism of different self-supervised learning techniques.
We discuss the working mechanism of contrastive techniques like SimCLR, non-contrastive techniques like BYOL, SWAV, SimSiam, Barlow Twins, and DINO.
We formulate different hypotheses and test them using the Imagenet100 dataset.
arXiv Detail & Related papers (2024-02-22T20:36:24Z) - Inverse Reinforcement Learning by Estimating Expertise of Demonstrators [18.50354748863624]
IRLEED, Inverse Reinforcement Learning by Estimating Expertise of Demonstrators, is a novel framework that overcomes hurdles without prior knowledge of demonstrator expertise.
IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance.
Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness.
arXiv Detail & Related papers (2024-02-02T20:21:09Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - One-Shot Federated Learning with Classifier-Guided Diffusion Models [44.604485649167216]
One-shot federated learning (OSFL) has gained attention in recent years due to its low communication cost.
In this paper, we explore the novel opportunities that diffusion models bring to OSFL and propose FedCADO.
FedCADO generates data that complies with clients' distributions and subsequently training the aggregated model on the server.
arXiv Detail & Related papers (2023-11-15T11:11:25Z) - Pre-trained Recommender Systems: A Causal Debiasing Perspective [19.712997823535066]
We develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains.
Our empirical studies show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings.
arXiv Detail & Related papers (2023-10-30T03:37:32Z) - Last Layer Marginal Likelihood for Invariance Learning [12.00078928875924]
We introduce a new lower bound to the marginal likelihood, which allows us to perform inference for a larger class of likelihood functions.
We work towards bringing this approach to neural networks by using an architecture with a Gaussian process in the last layer.
arXiv Detail & Related papers (2021-06-14T15:40:51Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.