Deciding What to Learn: A Rate-Distortion Approach
- URL: http://arxiv.org/abs/2101.06197v1
- Date: Fri, 15 Jan 2021 16:22:49 GMT
- Title: Deciding What to Learn: A Rate-Distortion Approach
- Authors: Dilip Arumugam and Benjamin Van Roy
- Abstract summary: In a complex environment, aiming to synthesize an optimal policy can become infeasible.
We automate the process of translating a designer's preferences into a fixed learning target for an agent.
We show improvements over Thompson sampling in identifying an optimal policy.
- Score: 21.945359614094503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents that learn to select optimal actions represent a prominent focus of
the sequential decision-making literature. In the face of a complex environment
or constraints on time and resources, however, aiming to synthesize such an
optimal policy can become infeasible. These scenarios give rise to an important
trade-off between the information an agent must acquire to learn and the
sub-optimality of the resulting policy. While an agent designer has a
preference for how this trade-off is resolved, existing approaches further
require that the designer translate these preferences into a fixed learning
target for the agent. In this work, leveraging rate-distortion theory, we
automate this process such that the designer need only express their
preferences via a single hyperparameter and the agent is endowed with the
ability to compute its own learning targets that best achieve the desired
trade-off. We establish a general bound on expected discounted regret for an
agent that decides what to learn in this manner along with computational
experiments that illustrate the expressiveness of designer preferences and even
show improvements over Thompson sampling in identifying an optimal policy.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Satisficing Exploration for Deep Reinforcement Learning [26.73584163318647]
In complex environments that approach the vastness and scale of the real world, attaining optimal performance may in fact be an entirely intractable endeavor.
Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions.
We extend an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies.
arXiv Detail & Related papers (2024-07-16T21:28:03Z) - Data-Driven Goal Recognition Design for General Behavioral Agents [14.750023724230774]
We introduce a data-driven approach to goal recognition design that can account for agents with general behavioral models.
We propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments.
arXiv Detail & Related papers (2024-04-03T20:38:22Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information.
Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction.
Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z) - Dynamic Memory for Interpretable Sequential Optimisation [0.0]
We present a solution to handling non-stationarity that is suitable for deployment at scale.
We develop an adaptive Bayesian learning agent that employs a novel form of dynamic memory.
We describe the architecture of a large-scale deployment of automatic-as-a-service.
arXiv Detail & Related papers (2022-06-28T12:29:13Z) - Deciding What to Model: Value-Equivalent Sampling for Reinforcement
Learning [21.931580762349096]
We introduce an algorithm that computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model.
We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem.
arXiv Detail & Related papers (2022-06-04T23:36:38Z) - Inverse Online Learning: Understanding Non-Stationary and Reactionary
Policies [79.60322329952453]
We show how to develop interpretable representations of how agents make decisions.
By understanding the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
arXiv Detail & Related papers (2022-03-14T17:40:42Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - The Value of Information When Deciding What to Learn [21.945359614094503]
This work builds upon the seminal design principle of information-directed sampling (Russo & Van Roy, 2014)
We offer new insights into learning targets from the literature on rate-distortion theory before turning to empirical results that confirm the value of information when deciding what to learn.
arXiv Detail & Related papers (2021-10-26T19:23:12Z) - Decentralized Reinforcement Learning: Global Decision-Making via Local
Economic Transactions [80.49176924360499]
We establish a framework for directing a society of simple, specialized, self-interested agents to solve sequential decision problems.
We derive a class of decentralized reinforcement learning algorithms.
We demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.
arXiv Detail & Related papers (2020-07-05T16:41:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.