Orchestrated Value Mapping for Reinforcement Learning
- URL: http://arxiv.org/abs/2203.07171v2
- Date: Wed, 16 Mar 2022 23:33:21 GMT
- Title: Orchestrated Value Mapping for Reinforcement Learning
- Authors: Mehdi Fatemi and Arash Tavakoli
- Abstract summary: We present a class of reinforcement learning algorithms based on two distinct principles.
The first principle enables incorporating specific properties into the value estimator that can enhance learning.
The second principle, on the other hand, allows for the value function to be represented as a composition of multiple utility functions.
- Score: 15.000818334408805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a general convergent class of reinforcement learning algorithms
that is founded on two distinct principles: (1) mapping value estimates to a
different space using arbitrary functions from a broad class, and (2) linearly
decomposing the reward signal into multiple channels. The first principle
enables incorporating specific properties into the value estimator that can
enhance learning. The second principle, on the other hand, allows for the value
function to be represented as a composition of multiple utility functions. This
can be leveraged for various purposes, e.g. dealing with highly varying reward
scales, incorporating a priori knowledge about the sources of reward, and
ensemble learning. Combining the two principles yields a general blueprint for
instantiating convergent algorithms by orchestrating diverse mapping functions
over multiple reward channels. This blueprint generalizes and subsumes
algorithms such as Q-Learning, Log Q-Learning, and Q-Decomposition. In
addition, our convergence proof for this general class relaxes certain required
assumptions in some of these algorithms. Based on our theory, we discuss
several interesting configurations as special cases. Finally, to illustrate the
potential of the design space that our theory opens up, we instantiate a
particular algorithm and evaluate its performance on the Atari suite.
Related papers
- Multi-agent imitation learning with function approximation: Linear Markov games and beyond [63.14746189846806]
We present the first theoretical analysis of multi-agent imitation learning (MAIL) in linear Markov games.<n>We show that it is possible to replace the state-action level "all policy deviation concentrability coefficient" with a concentrability coefficient defined at the feature level.<n>We propose a deep MAIL interactive algorithm which clearly outperforms BC on games such as Tic-Tac-Toe and Connect4.
arXiv Detail & Related papers (2026-02-26T09:50:15Z) - Multi-Property Synthesis [69.79949693440426]
We study synthesis with multiple properties, where satisfying all properties may be impossible.<n>Instead of enumerating subsets of properties, we compute in one fixed-point computation the relation between product-game states and the goal sets that are realizable from them.
arXiv Detail & Related papers (2026-01-15T18:18:33Z) - A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
A popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance.
We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.
We develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
arXiv Detail & Related papers (2024-07-02T14:12:21Z) - Quantization of Large Language Models with an Overdetermined Basis [73.79368761182998]
We introduce an algorithm for data quantization based on the principles of Kashin representation.
Our findings demonstrate that Kashin Quantization achieves competitive or superior quality in model performance.
arXiv Detail & Related papers (2024-04-15T12:38:46Z) - Graph Positional Encoding via Random Feature Propagation [39.84324765957645]
Two main families of node feature augmentation schemes have been explored for enhancing GNNs.
We propose a novel family of positional encoding schemes which draws a link between the above two approaches.
We empirically demonstrate that RFP significantly outperforms both spectral PE and random features in multiple node classification and graph classification benchmarks.
arXiv Detail & Related papers (2023-03-06T06:28:20Z) - Multivariate Systemic Risk Measures and Computation by Deep Learning
Algorithms [63.03966552670014]
We discuss the key related theoretical aspects, with a particular focus on the fairness properties of primal optima and associated risk allocations.
The algorithms we provide allow for learning primals, optima for the dual representation and corresponding fair risk allocations.
arXiv Detail & Related papers (2023-02-02T22:16:49Z) - Generalization on the Unseen, Logic Reasoning and Degree Curriculum [25.7378861650474]
This paper considers the learning of logical (Boolean) functions with a focus on the generalization on the unseen (GOTU) setting.
We study how different network architectures trained by (S)GD perform under GOTU.
More specifically, this means an interpolator of the training data that has minimal Fourier mass on the higher degree basis elements.
arXiv Detail & Related papers (2023-01-30T17:44:05Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Classical shadows with Pauli-invariant unitary ensembles [0.0]
We consider the class of Pauli-invariant unitary ensembles that are invariant under multiplication by a Pauli operator.
Our results pave the way for more efficient or robust protocols for predicting important properties of quantum states.
arXiv Detail & Related papers (2022-02-07T15:06:30Z) - pRSL: Interpretable Multi-label Stacking by Learning Probabilistic Rules [0.0]
We present the probabilistic rule stacking (pRSL) which uses probabilistic propositional logic rules and belief propagation to combine the predictions of several underlying classifiers.
We derive algorithms for exact and approximate inference and learning, and show that pRSL reaches state-of-the-art performance on various benchmark datasets.
arXiv Detail & Related papers (2021-05-28T14:06:21Z) - Finite-Function-Encoding Quantum States [52.77024349608834]
We introduce finite-function-encoding (FFE) states which encode arbitrary $d$-valued logic functions.
We investigate some of their structural properties.
arXiv Detail & Related papers (2020-12-01T13:53:23Z) - A Functional Perspective on Learning Symmetric Functions with Neural
Networks [48.80300074254758]
We study the learning and representation of neural networks defined on measures.
We establish approximation and generalization bounds under different choices of regularization.
The resulting models can be learned efficiently and enjoy generalization guarantees that extend across input sizes.
arXiv Detail & Related papers (2020-08-16T16:34:33Z) - Preventing Value Function Collapse in Ensemble {Q}-Learning by
Maximizing Representation Diversity [0.0]
Maxmin and Ensemble Q-learning algorithms have used different estimates provided by the ensembles of learners to reduce the overestimation bias.
Unfortunately, these learners can converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN.
We propose and compare five regularization functions inspired from economics theory and consensus optimization.
arXiv Detail & Related papers (2020-06-24T15:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.