PAC-Bayesian Lifelong Learning For Multi-Armed Bandits
- URL: http://arxiv.org/abs/2203.03303v1
- Date: Mon, 7 Mar 2022 11:23:12 GMT
- Title: PAC-Bayesian Lifelong Learning For Multi-Armed Bandits
- Authors: Hamish Flynn, David Reeb, Melih Kandemir and Jan Peters
- Abstract summary: We present a PAC-Bayesian analysis of lifelong learning.
We consider the case when each learning task is a multi-armed bandit problem.
We propose lifelong learning algorithms that use our new bounds as learning objectives.
- Score: 38.76324445090305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a PAC-Bayesian analysis of lifelong learning. In the lifelong
learning problem, a sequence of learning tasks is observed one-at-a-time, and
the goal is to transfer information acquired from previous tasks to new
learning tasks. We consider the case when each learning task is a multi-armed
bandit problem. We derive lower bounds on the expected average reward that
would be obtained if a given multi-armed bandit algorithm was run in a new task
with a particular prior and for a set number of steps. We propose lifelong
learning algorithms that use our new bounds as learning objectives. Our
proposed algorithms are evaluated in several lifelong multi-armed bandit
problems and are found to perform better than a baseline method that does not
use generalisation bounds.
Related papers
- Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits [15.342585350280535]
We study how representation learning can improve the learning efficiency of contextual bandit problems.
We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator.
arXiv Detail & Related papers (2024-10-02T22:30:29Z) - Continual Learning of Numerous Tasks from Long-tail Distributions [17.706669222987273]
Continual learning focuses on developing models that learn and adapt to new tasks while retaining previously acquired knowledge.
Existing continual learning algorithms usually involve a small number of tasks with uniform sizes and may not accurately represent real-world learning scenarios.
We propose a method that reuses the states in Adam by maintaining a weighted average of the second moments from previous tasks.
We demonstrate that our method, compatible with most existing continual learning algorithms, effectively reduces forgetting with only a small amount of additional computational or memory costs.
arXiv Detail & Related papers (2024-04-03T13:56:33Z) - Multitask Learning with No Regret: from Improved Confidence Bounds to
Active Learning [79.07658065326592]
Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning.
We provide novel multitask confidence intervals in the challenging setting when neither the similarity between tasks nor the tasks' features are available to the learner.
We propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance.
arXiv Detail & Related papers (2023-08-03T13:08:09Z) - Self-paced Weight Consolidation for Continual Learning [39.27729549041708]
Continual learning algorithms are popular in preventing catastrophic forgetting in sequential task learning settings.
We propose a self-paced Weight Consolidation (spWC) framework to attain continual learning.
arXiv Detail & Related papers (2023-07-20T13:07:41Z) - Lifelong Bandit Optimization: No Prior and No Regret [70.94238868711952]
We develop LIBO, an algorithm which adapts to the environment by learning from past experience.
We assume a kernelized structure where the kernel is unknown but shared across all tasks.
Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees optimal performance.
arXiv Detail & Related papers (2022-10-27T14:48:49Z) - You Only Live Once: Single-Life Reinforcement Learning [124.1738675154651]
In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial.
We formalize this problem setting, where an agent must complete a task within a single episode without interventions.
We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
arXiv Detail & Related papers (2022-10-17T09:00:11Z) - Lifelong Learning in Multi-Armed Bandits [22.301793734117805]
We study the problem in the multi-armed bandit framework with the objective to minimize the total regret incurred over a series of tasks.
While most bandit algorithms are designed to have a low worst-case regret, we examine here the average regret over bandit instances drawn from some prior distribution which may change over time.
arXiv Detail & Related papers (2020-12-28T15:13:31Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z) - Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding
Meta-Amortization Error [50.83356836818667]
We develop a novel meta-regularization objective using it cyclical annealing schedule and it maximum mean discrepancy (MMD) criterion.
The experimental results show that our approach substantially outperforms standard meta-learning algorithms.
arXiv Detail & Related papers (2020-03-04T04:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.