Remote Contextual Bandits
- URL: http://arxiv.org/abs/2202.05182v1
- Date: Thu, 10 Feb 2022 17:31:20 GMT
- Title: Remote Contextual Bandits
- Authors: Francesco Pase, Deniz Gunduz, Michele Zorzi
- Abstract summary: We consider a remote contextual multi-armed bandit (CMAB) problem.
The decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel.
We study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret achieved when Thompson sampling strategy is adopted.
- Score: 18.40166098572039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a remote contextual multi-armed bandit (CMAB) problem, in which
the decision-maker observes the context and the reward, but must communicate
the actions to be taken by the agents over a rate-limited communication
channel. This can model, for example, a personalized ad placement application,
where the content owner observes the individual visitors to its website, and
hence has the context information, but must convey the ads that must be shown
to each visitor to a separate entity that manages the marketing content. In
this remote CMAB (R-CMAB) problem, the constraint on the communication rate
between the decision-maker and the agents imposes a trade-off between the
number of bits sent per agent and the acquired average reward. We are
particularly interested in characterizing the rate required to achieve
sub-linear regret. Consequently, this can be considered as a policy compression
problem, where the distortion metric is induced by the learning objectives. We
first study the fundamental information theoretic limits of this problem by
letting the number of agents go to infinity, and study the regret achieved when
Thompson sampling strategy is adopted. In particular, we identify two distinct
rate regions resulting in linear and sub-linear regret behavior, respectively.
Then, we provide upper bounds on the achievable regret when the decision-maker
can reliably transmit the policy without distortion.
Related papers
- Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Cooperative Actor-Critic via TD Error Aggregation [12.211031907519827]
We introduce a decentralized actor-critic algorithm with TD error aggregation that does not violate privacy issues.
We provide a convergence analysis under diminishing step size to verify that the agents maximize the team-average objective function.
arXiv Detail & Related papers (2022-07-25T21:10:39Z) - Rate-Constrained Remote Contextual Bandits [18.40166098572039]
We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem.
We study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity.
We then analyze the optimal compression scheme achievable in the limit with infinite agents, when using the forward and reverse KL divergence as distortion metric.
arXiv Detail & Related papers (2022-04-26T22:34:54Z) - Contextual Bandits for Advertising Campaigns: A Diffusion-Model
Independent Approach (Extended Version) [73.59962178534361]
We study an influence problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate.
In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaign.
We describe and compare two methods of contextual multi-armed bandits, with upper-confidence bounds on the remaining potential of influencers.
arXiv Detail & Related papers (2022-01-13T22:06:10Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Learning Selective Communication for Multi-Agent Path Finding [18.703918339797283]
Decision Causal Communication (DCC) is a simple yet efficient model to enable agents to select neighbors to conduct communication.
DCC is suitable for decentralized execution to handle large scale problems.
arXiv Detail & Related papers (2021-09-12T03:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.