Bounding the Optimal Value Function in Compositional Reinforcement
Learning
- URL: http://arxiv.org/abs/2303.02557v2
- Date: Tue, 13 Jun 2023 22:18:35 GMT
- Title: Bounding the Optimal Value Function in Compositional Reinforcement
Learning
- Authors: Jacob Adamczyk and Volodymyr Makarenko and Argenis Arriojas and Stas
Tiomkin and Rahul V. Kulkarni
- Abstract summary: We show that the optimal solution for a composite task can be related to the known primitive task solutions.
We also show that the regret of using a zero-shot policy can be bounded for this class of functions.
- Score: 2.7998963147546148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of reinforcement learning (RL), agents are often tasked with
solving a variety of problems differing only in their reward functions. In
order to quickly obtain solutions to unseen problems with new reward functions,
a popular approach involves functional composition of previously solved tasks.
However, previous work using such functional composition has primarily focused
on specific instances of composition functions whose limiting assumptions allow
for exact zero-shot composition. Our work unifies these examples and provides a
more general framework for compositionality in both standard and
entropy-regularized RL. We find that, for a broad class of functions, the
optimal solution for the composite task of interest can be related to the known
primitive task solutions. Specifically, we present double-sided inequalities
relating the optimal composite value function to the value functions for the
primitive tasks. We also show that the regret of using a zero-shot policy can
be bounded for this class of functions. The derived bounds can be used to
develop clipping approaches for reducing uncertainty during training, allowing
agents to quickly adapt to new tasks.
Related papers
- Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided
Bounds on the Value Function [4.48890356952206]
We show how an arbitrary approximation for the value function can be used to derive double-sided bounds on the optimal value function of interest.
We extend the framework with error analysis for continuous state and action spaces.
arXiv Detail & Related papers (2023-02-19T21:47:24Z) - ReMIX: Regret Minimization for Monotonic Value Function Factorization in
Multiagent Reinforcement Learning [10.741140541225604]
We study the optimal projection of an unrestricted mixing function onto monotonic function classes.
We use the Lagrangian multiplier method to obtain the close-form optimal projection weights.
Our results on Predator-Prey and StarCraft Multiagent Challenge environments demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-11T03:52:51Z) - Efficient Planning in Combinatorial Action Spaces with Applications to
Cooperative Multi-Agent Reinforcement Learning [16.844525262228103]
In cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a blow-up in the action space by the number of agents.
As a minimal requirement, we assume access to an argmax oracle that allows to efficiently compute the greedy policy for any Q-function in the model class.
We propose efficient algorithms for this setting that lead to compute and query complexity in all relevant problem parameters.
arXiv Detail & Related papers (2023-02-08T23:42:49Z) - Multi-Task Learning with Prior Information [5.770309971945476]
We propose a multi-task learning framework, where we utilize prior knowledge about the relations between features.
We also impose a penalty on the coefficients changing for each specific feature to ensure related tasks have similar coefficients on common features shared among them.
arXiv Detail & Related papers (2023-01-04T12:48:05Z) - Utilizing Prior Solutions for Reward Shaping and Composition in
Entropy-Regularized Reinforcement Learning [3.058685580689605]
We develop a general framework for reward shaping and task composition in entropy-regularized RL.
We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL.
We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL.
arXiv Detail & Related papers (2022-12-02T13:57:53Z) - Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks.
In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks.
We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z) - Compressing Deep ODE-Nets using Basis Function Expansions [105.05435207079759]
We consider formulations of the weights as continuous-depth functions using linear combinations of basis functions.
This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance.
In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments.
arXiv Detail & Related papers (2021-06-21T03:04:51Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - A Multi-Agent Primal-Dual Strategy for Composite Optimization over
Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.