Proximal Bellman mappings for reinforcement learning and their
application to robust adaptive filtering
- URL: http://arxiv.org/abs/2309.07548v1
- Date: Thu, 14 Sep 2023 09:20:21 GMT
- Title: Proximal Bellman mappings for reinforcement learning and their
application to robust adaptive filtering
- Authors: Yuki Akiyama and Konstantinos Slavakis
- Abstract summary: This paper introduces the novel class of Bellman mappings.
The mappings are defined in reproducing kernel Hilbert spaces.
An approximate policy-iteration scheme is built on the proposed class of mappings.
- Score: 4.140907550856865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper aims at the algorithmic/theoretical core of reinforcement learning
(RL) by introducing the novel class of proximal Bellman mappings. These
mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit
from the rich approximation properties and inner product of RKHSs, they are
shown to belong to the powerful Hilbertian family of (firmly) nonexpansive
mappings, regardless of the values of their discount factors, and possess ample
degrees of design freedom to even reproduce attributes of the classical Bellman
mappings and to pave the way for novel RL designs. An approximate
policy-iteration scheme is built on the proposed class of mappings to solve the
problem of selecting online, at every time instance, the "optimal" exponent $p$
in a $p$-norm loss to combat outliers in linear adaptive filtering, without
training data and any knowledge on the statistical properties of the outliers.
Numerical tests on synthetic data showcase the superior performance of the
proposed framework over several non-RL and kernel-based RL schemes.
Related papers
- REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering [3.730504020733928]
This paper designs novel nonparametric Bellman mappings in reproducing kernel Hilbert spaces (RKHSs) for reinforcement learning (RL)
The proposed mappings benefit from the rich approximating properties of RKHSs, adopt no assumptions on the statistics of the data owing to their nonparametric nature, and may operate without any training data.
As an application, the proposed mappings are employed to offer a novel solution to the problem of countering outliers in adaptive filtering.
arXiv Detail & Related papers (2024-03-29T07:15:30Z) - A Neuromorphic Architecture for Reinforcement Learning from Real-Valued
Observations [0.34410212782758043]
Reinforcement Learning (RL) provides a powerful framework for decision-making in complex environments.
This paper presents a novel Spiking Neural Network (SNN) architecture for solving RL problems with real-valued observations.
arXiv Detail & Related papers (2023-07-06T12:33:34Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant
Fuel Optimization [0.0]
This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization.
arXiv Detail & Related papers (2023-05-09T23:51:24Z) - Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning [73.80728148866906]
Quasimetric Reinforcement Learning (QRL) is a new RL method that utilizes quasimetric models to learn optimal value functions.
On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance.
arXiv Detail & Related papers (2023-04-03T17:59:58Z) - online and lightweight kernel-based approximated policy iteration for
dynamic p-norm linear adaptive filtering [8.319127681936815]
This paper introduces a solution to the problem of selecting dynamically (online) the optimal'' p-norm to combat outliers in linear adaptive filtering.
The proposed framework is built on kernel-based reinforcement learning (KBRL)
arXiv Detail & Related papers (2022-10-21T06:29:01Z) - Dynamic selection of p-norm in linear adaptive filtering via online
kernel-based reinforcement learning [8.319127681936815]
This study addresses the problem of selecting dynamically, at each time instance, the optimal'' p-norm to combat outliers in linear adaptive filtering.
Online and data-driven framework is designed via kernel-based reinforcement learning (KBRL)
arXiv Detail & Related papers (2022-10-20T14:49:39Z) - Exponential Family Model-Based Reinforcement Learning via Score Matching [97.31477125728844]
We propose an optimistic model-based algorithm, dubbed SMRL, for finitehorizon episodic reinforcement learning (RL)
SMRL uses score matching, an unnormalized density estimation technique that enables efficient estimation of the model parameter by ridge regression.
arXiv Detail & Related papers (2021-12-28T15:51:07Z) - B-Pref: Benchmarking Preference-Based Reinforcement Learning [84.41494283081326]
We introduce B-Pref, a benchmark specially designed for preference-based RL.
A key challenge with such a benchmark is providing the ability to evaluate candidate algorithms quickly.
B-Pref alleviates this by simulating teachers with a wide array of irrationalities.
arXiv Detail & Related papers (2021-11-04T17:32:06Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.