Leveraging Demonstrations to Improve Online Learning: Quality Matters
- URL: http://arxiv.org/abs/2302.03319v4
- Date: Wed, 17 May 2023 17:47:15 GMT
- Title: Leveraging Demonstrations to Improve Online Learning: Quality Matters
- Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen
- Abstract summary: We show that the degree of improvement must depend on the quality of the demonstration data.
We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule.
- Score: 54.98983862640944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the extent to which offline demonstration data can improve
online learning. It is natural to expect some improvement, but the question is
how, and by how much? We show that the degree of improvement must depend on the
quality of the demonstration data. To generate portable insights, we focus on
Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online
learning algorithm and model. The demonstration data is generated by an expert
with a given competence level, a notion we introduce. We propose an informed TS
algorithm that utilizes the demonstration data in a coherent way through Bayes'
rule and derive a prior-dependent Bayesian regret bound. This offers insight
into how pretraining can greatly improve online performance and how the degree
of improvement increases with the expert's competence level. We also develop a
practical, approximate informed TS algorithm through Bayesian bootstrapping and
show substantial empirical regret reduction through experiments.
Related papers
- DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - RanDumb: A Simple Approach that Questions the Efficacy of Continual Representation Learning [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms.
We then train a simple linear classifier on top without storing any exemplars, processing one sample at a time in an online continual learning setting.
Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z) - Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt
Learning with Data-Dependent Prior [14.232144691524528]
Recent Vision-Language Pretrained models have become the backbone for many downstream tasks.
MLE training can lead the context vector to over-fit dominant image features in the training data.
This paper presents a Bayesian-based framework of prompt learning, which could alleviate the overfitting issues on few-shot learning application.
arXiv Detail & Related papers (2024-01-09T10:15:59Z) - A Comprehensive Empirical Evaluation on Online Continual Learning [20.39495058720296]
We evaluate methods from the literature that tackle online continual learning.
We focus on the class-incremental setting in the context of image classification.
We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks.
arXiv Detail & Related papers (2023-08-20T17:52:02Z) - Data Quality in Imitation Learning [15.939363481618738]
In offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity.
This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations.
In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift.
arXiv Detail & Related papers (2023-06-04T18:48:32Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Bayesian Q-learning With Imperfect Expert Demonstrations [56.55609745121237]
We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations.
We evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards.
arXiv Detail & Related papers (2022-10-01T17:38:19Z) - Self-Imitation Learning from Demonstrations [4.907551775445731]
Self-Imitation Learning exploits agent's past good experience to learn from suboptimal demonstrations.
We show that SILfD can learn from demonstrations that are noisy or far from optimal.
We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments.
arXiv Detail & Related papers (2022-03-21T11:56:56Z) - Online Continual Learning with Natural Distribution Shifts: An Empirical
Study with Visual Data [101.6195176510611]
"Online" continual learning enables evaluating both information retention and online learning efficacy.
In online continual learning, each incoming small batch of data is first used for testing and then added to the training set, making the problem truly online.
We introduce a new benchmark for online continual visual learning that exhibits large scale and natural distribution shifts.
arXiv Detail & Related papers (2021-08-20T06:17:20Z) - Recognizing More Emotions with Less Data Using Self-supervised Transfer
Learning [0.0]
We propose a novel transfer learning method for speech emotion recognition.
With as low as 125 examples per emotion class, we were able to reach a higher accuracy than a strong baseline trained on 8 times more data.
arXiv Detail & Related papers (2020-11-11T06:18:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.