Related papers: Bounding the Optimal Value Function in Compositional Reinforcement Learning

Bounding the Optimal Value Function in Compositional Reinforcement Learning

URL: http://arxiv.org/abs/2303.02557v2
Date: Tue, 13 Jun 2023 22:18:35 GMT
Title: Bounding the Optimal Value Function in Compositional Reinforcement Learning
Authors: Jacob Adamczyk and Volodymyr Makarenko and Argenis Arriojas and Stas Tiomkin and Rahul V. Kulkarni
Abstract summary: We show that the optimal solution for a composite task can be related to the known primitive task solutions. We also show that the regret of using a zero-shot policy can be bounded for this class of functions.
Score: 2.7998963147546148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of reinforcement learning (RL), agents are often tasked with solving a variety of problems differing only in their reward functions. In order to quickly obtain solutions to unseen problems with new reward functions, a popular approach involves functional composition of previously solved tasks. However, previous work using such functional composition has primarily focused on specific instances of composition functions whose limiting assumptions allow for exact zero-shot composition. Our work unifies these examples and provides a more general framework for compositionality in both standard and entropy-regularized RL. We find that, for a broad class of functions, the optimal solution for the composite task of interest can be related to the known primitive task solutions. Specifically, we present double-sided inequalities relating the optimal composite value function to the value functions for the primitive tasks. We also show that the regret of using a zero-shot policy can be bounded for this class of functions. The derived bounds can be used to develop clipping approaches for reducing uncertainty during training, allowing agents to quickly adapt to new tasks.

Related papers

Active Value Querying to Minimize Additive Error in Subadditive Set Function Learning [0.0]
We study a problem of approximating an unknown subadditive (or a subclass of thereof) set function with respect to an additive error.<n>Our contributions are threefold: (i) a thorough exploration of minimal and maximal completions of different classes of set functions with missing values and an analysis of their resulting distance; (ii) the development of methods to minimize this distance over classes of set functions with a known prior; and (iii) empirical demonstrations of the algorithms' performance in practical scenarios.
arXiv Detail & Related papers (2026-02-26T22:21:26Z)
Recursive Reward Aggregation [51.552609126905885]
We propose an alternative approach for flexible behavior alignment that eliminates the need to modify the reward function.<n>By introducing an algebraic perspective on Markov decision processes (MDPs), we show that the Bellman equations naturally emerge from the generation and aggregation of rewards.<n>Our approach applies to both deterministic and deterministic settings and seamlessly integrates with value-based and actor-critic algorithms.
arXiv Detail & Related papers (2025-07-11T12:37:20Z)
Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided Bounds on the Value Function [4.48890356952206]
We show how an arbitrary approximation for the value function can be used to derive double-sided bounds on the optimal value function of interest. We extend the framework with error analysis for continuous state and action spaces.
arXiv Detail & Related papers (2023-02-19T21:47:24Z)
ReMIX: Regret Minimization for Monotonic Value Function Factorization in Multiagent Reinforcement Learning [10.741140541225604]
We study the optimal projection of an unrestricted mixing function onto monotonic function classes. We use the Lagrangian multiplier method to obtain the close-form optimal projection weights. Our results on Predator-Prey and StarCraft Multiagent Challenge environments demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-02-11T03:52:51Z)
Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning [16.844525262228103]
In cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a blow-up in the action space by the number of agents. As a minimal requirement, we assume access to an argmax oracle that allows to efficiently compute the greedy policy for any Q-function in the model class. We propose efficient algorithms for this setting that lead to compute and query complexity in all relevant problem parameters.
arXiv Detail & Related papers (2023-02-08T23:42:49Z)
Multi-Task Learning with Prior Information [5.770309971945476]
We propose a multi-task learning framework, where we utilize prior knowledge about the relations between features. We also impose a penalty on the coefficients changing for each specific feature to ensure related tasks have similar coefficients on common features shared among them.
arXiv Detail & Related papers (2023-01-04T12:48:05Z)
Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning [3.058685580689605]
We develop a general framework for reward shaping and task composition in entropy-regularized RL. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL.
arXiv Detail & Related papers (2022-12-02T13:57:53Z)
Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks. In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks. We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z)
Compressing Deep ODE-Nets using Basis Function Expansions [105.05435207079759]
We consider formulations of the weights as continuous-depth functions using linear combinations of basis functions. This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance. In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments.
arXiv Detail & Related papers (2021-06-21T03:04:51Z)
Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly. In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously. In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
A Multi-Agent Primal-Dual Strategy for Composite Optimization over Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.