Learning in Distributed Contextual Linear Bandits Without Sharing the
Context
- URL: http://arxiv.org/abs/2206.04180v1
- Date: Wed, 8 Jun 2022 22:07:01 GMT
- Title: Learning in Distributed Contextual Linear Bandits Without Sharing the
Context
- Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli
- Abstract summary: Contextual linear bandits is a rich and theoretically important model that has many practical applications.
In this paper, we consider a distributed memoryless contextual linear bandit learning problem.
- Score: 39.70492757288025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextual linear bandits is a rich and theoretically important model that
has many practical applications. Recently, this setup gained a lot of interest
in applications over wireless where communication constraints can be a
performance bottleneck, especially when the contexts come from a large
$d$-dimensional space. In this paper, we consider a distributed memoryless
contextual linear bandit learning problem, where the agents who observe the
contexts and take actions are geographically separated from the learner who
performs the learning while not seeing the contexts. We assume that contexts
are generated from a distribution and propose a method that uses $\approx 5d$
bits per context for the case of unknown context distribution and $0$ bits per
context if the context distribution is known, while achieving nearly the same
regret bound as if the contexts were directly observable. The former bound
improves upon existing bounds by a $\log(T)$ factor, where $T$ is the length of
the horizon, while the latter achieves information theoretical tightness.
Related papers
- Fair Exploration and Exploitation [4.368185344922342]
We consider the fully adversarial problem in which, apart from bounded losses, there are no assumptions whatsoever on the generation of the contexts and losses.
In our problem we assume that the context set is partitioned into a set of protected groups.
We develop an algorithm FexEx for this problem which has remarkable efficiency.
arXiv Detail & Related papers (2024-11-06T22:25:56Z) - QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory [66.01597794579568]
We introduce information bottleneck theory (IB) to model the problem.
We propose a cross-attention-based approach to approximate mutual information in IB.
Our method achieves a 25% increase in compression rate compared to the state-of-the-art.
arXiv Detail & Related papers (2024-08-20T02:44:45Z) - On the Optimal Regret of Locally Private Linear Contextual Bandit [18.300225068036642]
We show that it is possible to achieve an $tilde O(sqrtT)$ regret upper bound for locally private linear contextual bandit.
Our solution relies on several new algorithmic and analytical ideas.
arXiv Detail & Related papers (2024-04-15T02:00:24Z) - LLoCO: Learning Long Contexts Offline [63.3458260335454]
We propose LLoCO, a novel approach to processing long contexts.
LLoCO learns contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA.
Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens.
arXiv Detail & Related papers (2024-04-11T17:57:22Z) - Optimal cross-learning for contextual bandits with unknown context
distributions [28.087360479901978]
We consider the problem of designing contextual bandit algorithms in the cross-learning'' setting of Balseiro et al.
We provide an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of $widetildeO(sqrtTK)$, independent of the number of contexts.
At the core of our algorithm is a novel technique for coordinating the execution of an algorithm over multiple epochs.
arXiv Detail & Related papers (2024-01-03T18:02:13Z) - First- and Second-Order Bounds for Adversarial Linear Contextual Bandits [22.367921675238318]
We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of $K$ arms to change over time without restriction.
Since $V_T$ or $L_T*$ may be significantly smaller than $T$, these improve over the worst-case regret whenever the environment is relatively benign.
arXiv Detail & Related papers (2023-05-01T14:00:15Z) - An Adaptive Deep RL Method for Non-Stationary Environments with
Piecewise Stable Context [109.49663559151377]
Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian.
In this paper, we propose a textittextbfSegmented textbfContext textbfBelief textbfAugmented textbfDeep(SeCBAD) RL method.
Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current
arXiv Detail & Related papers (2022-12-24T13:43:39Z) - Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear
Bandit Algorithms [39.70492757288025]
We address the contextual linear bandit problem, where a decision maker is provided a context.
We show that the contextual problem can be solved as a linear bandit problem.
Our results imply a $O(dsqrtTlog T)$ high-probability regret bound for contextual linear bandits.
arXiv Detail & Related papers (2022-11-08T22:18:53Z) - Communication Efficient Distributed Learning for Kernelized Contextual
Bandits [58.78878127799718]
We tackle the communication efficiency challenge of learning kernelized contextual bandits in a distributed setting.
We consider non-linear reward mappings, by letting agents collaboratively search in a reproducing kernel Hilbert space.
We rigorously proved that our algorithm can attain sub-linear rate in both regret and communication cost.
arXiv Detail & Related papers (2022-06-10T01:39:15Z) - Measuring and Increasing Context Usage in Context-Aware Machine
Translation [64.5726087590283]
We introduce a new metric, conditional cross-mutual information, to quantify the usage of context by machine translation models.
We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models.
arXiv Detail & Related papers (2021-05-07T19:55:35Z) - Don't Judge an Object by Its Context: Learning to Overcome Contextual
Bias [113.44471186752018]
Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy.
This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations.
arXiv Detail & Related papers (2020-01-09T18:31:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.