On Bits and Bandits: Quantifying the Regret-Information Trade-off
- URL: http://arxiv.org/abs/2405.16581v1
- Date: Sun, 26 May 2024 14:18:38 GMT
- Title: On Bits and Bandits: Quantifying the Regret-Information Trade-off
- Authors: Itai Shufaro, Nadav Merlis, Nir Weinberger, Shie Mannor,
- Abstract summary: In interactive decision-making tasks, information can be acquired by direct interactions, through receiving indirect feedback, and from external knowledgeable sources.
We show that information from external sources, measured in bits, can be traded off for regret, measured in reward.
We introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates.
- Score: 62.64904903955711
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In interactive decision-making tasks, information can be acquired by direct interactions, through receiving indirect feedback, and from external knowledgeable sources. We examine the trade-off between the information an agent accumulates and the regret it suffers. We show that information from external sources, measured in bits, can be traded off for regret, measured in reward. We invoke information-theoretic methods for obtaining regret lower bounds, that also allow us to easily re-derive several known lower bounds. We then generalize a variety of interactive decision-making tasks with external information to a new setting. Using this setting, we introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates. These lower bounds also prove the near-optimality of Thompson sampling for Bayesian problems. Finally, we demonstrate the utility of these bounds in improving the performance of a question-answering task with large language models, allowing us to obtain valuable insights.
Related papers
- PropMEND: Hypernetworks for Knowledge Propagation in LLMs [82.99849359892112]
We present a hypernetwork-based approach for knowledge propagation, named PropMEND.<n>We show almost 2x accuracy on challenging multi-hop questions whose answers are not explicitly stated in the injected fact.<n>We also introduce a new dataset, Controlled RippleEdit, to evaluate the generalization of our hypernetwork.
arXiv Detail & Related papers (2025-06-10T15:44:19Z) - Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - Language Models Can Reduce Asymmetry in Information Markets [100.38786498942702]
We introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants.
The central mechanism enabling this marketplace is the agents' dual capabilities: they have the capacity to assess the quality of privileged information but also come equipped with the ability to forget.
To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information.
arXiv Detail & Related papers (2024-03-21T14:48:37Z) - Multi-Armed Bandits with Abstention [62.749500564313834]
We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic element: abstention.
In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the instantaneous reward before observing it.
arXiv Detail & Related papers (2024-02-23T06:27:12Z) - ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent [50.508669199496474]
We develop a ReAct-style LLM agent with the ability to reason and act upon external knowledge.
We refine the agent through a ReST-like method that iteratively trains on previous trajectories.
Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model.
arXiv Detail & Related papers (2023-12-15T18:20:15Z) - Remote Contextual Bandits [18.40166098572039]
We consider a remote contextual multi-armed bandit (CMAB) problem.
The decision-maker observes the context and the reward, but must communicate the actions to be taken by the agents over a rate-limited communication channel.
We study the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret achieved when Thompson sampling strategy is adopted.
arXiv Detail & Related papers (2022-02-10T17:31:20Z) - Transfer in Reinforcement Learning via Regret Bounds for Learning Agents [2.023315598404668]
We show that when agents share their observations the total regret of all agents is smaller by a factor of $sqrtaleph$.
This result demonstrates how considering the regret in multi-agent settings can provide theoretical bounds on the benefit of sharing observations in transfer learning.
arXiv Detail & Related papers (2022-02-02T18:10:21Z) - Online Transfer Learning: Negative Transfer and Effect of Prior
Knowledge [6.193838300896449]
We study the online transfer learning problems where the source samples are given in an offline way while the target samples arrive sequentially.
We define the expected regret of the online transfer learning problem and provide upper bounds on the regret using information-theoretic quantities.
Examples show that the derived bounds are accurate even for small sample sizes.
arXiv Detail & Related papers (2021-05-04T12:12:14Z) - Multi-Armed Bandits with Local Differential Privacy [32.538737981996405]
In bandit systems, the rewards may refer to the users' activities, which may involve private information.
We study the regret upper and lower bounds for MAB algorithms with a given LDP guarantee.
We prove a lower bound and propose algorithms whose regret upper bounds match the lower bound up to constant factors.
arXiv Detail & Related papers (2020-07-06T23:36:20Z) - Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals [53.484562601127195]
We point out the inability to infer behavioral conclusions from probing results.
We offer an alternative method that focuses on how the information is being used, rather than on what information is encoded.
arXiv Detail & Related papers (2020-06-01T15:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.