Related papers: On Bits and Bandits: Quantifying the Regret-Information Trade-off

On Bits and Bandits: Quantifying the Regret-Information Trade-off

URL: http://arxiv.org/abs/2405.16581v4
Date: Sun, 23 Feb 2025 13:06:05 GMT
Title: On Bits and Bandits: Quantifying the Regret-Information Trade-off
Authors: Itai Shufaro, Nadav Merlis, Nir Weinberger, Shie Mannor,
Abstract summary: We study the trade-off between the information an agent accumulates and the regret it suffers.<n>We introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates.<n>We also prove regret upper bounds using the amount of information the agent accumulates.
Score: 62.64904903955711
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many sequential decision problems, an agent performs a repeated task. He then suffers regret and obtains information that he may use in the following rounds. However, sometimes the agent may also obtain information and avoid suffering regret by querying external sources. We study the trade-off between the information an agent accumulates and the regret it suffers. We invoke information-theoretic methods for obtaining regret lower bounds, that also allow us to easily re-derive several known lower bounds. We introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates. We also prove regret upper bounds using the amount of information the agent accumulates. These bounds show that information measured in bits, can be traded off for regret, measured in reward. Finally, we demonstrate the utility of these bounds in improving the performance of a question-answering task with large language models, allowing us to obtain valuable insights.

Related papers

Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z)
Language Models Can Reduce Asymmetry in Information Markets [100.38786498942702]
We introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The central mechanism enabling this marketplace is the agents' dual capabilities: they have the capacity to assess the quality of privileged information but also come equipped with the ability to forget. To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information.
arXiv Detail & Related papers (2024-03-21T14:48:37Z)
Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z)
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent [50.508669199496474]
We develop a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We refine the agent through a ReST-like method that iteratively trains on previous trajectories. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model.
arXiv Detail & Related papers (2023-12-15T18:20:15Z)
Remote Contextual Bandits [18.40166098572039]
We consider a remote contextual multi-armed bandit (CMAB) problem. The decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel. We study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret achieved when Thompson sampling strategy is adopted.
arXiv Detail & Related papers (2022-02-10T17:31:20Z)
Transfer in Reinforcement Learning via Regret Bounds for Learning Agents [2.023315598404668]
We show that when agents share their observations the total regret of all agents is smaller by a factor of $sqrtaleph$. This result demonstrates how considering the regret in multi-agent settings can provide theoretical bounds on the benefit of sharing observations in transfer learning.
arXiv Detail & Related papers (2022-02-02T18:10:21Z)
Online Transfer Learning: Negative Transfer and Effect of Prior Knowledge [6.193838300896449]
We study the online transfer learning problems where the source samples are given in an offline way while the target samples arrive sequentially. We define the expected regret of the online transfer learning problem and provide upper bounds on the regret using information-theoretic quantities. Examples show that the derived bounds are accurate even for small sample sizes.
arXiv Detail & Related papers (2021-05-04T12:12:14Z)
Multi-Armed Bandits with Local Differential Privacy [32.538737981996405]
In bandit systems, the rewards may refer to the users' activities, which may involve private information. We study the regret upper and lower bounds for MAB algorithms with a given LDP guarantee. We prove a lower bound and propose algorithms whose regret upper bounds match the lower bound up to constant factors.
arXiv Detail & Related papers (2020-07-06T23:36:20Z)
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [53.484562601127195]
We point out the inability to infer behavioral conclusions from probing results. We offer an alternative method that focuses on how the information is being used, rather than on what information is encoded.
arXiv Detail & Related papers (2020-06-01T15:00:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.