Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering
- URL: http://arxiv.org/abs/2403.20020v1
- Date: Fri, 29 Mar 2024 07:15:30 GMT
- Title: Nonparametric Bellman Mappings for Reinforcement Learning: Application to Robust Adaptive Filtering
- Authors: Yuki Akiyama, Minh Vu, Konstantinos Slavakis,
- Abstract summary: This paper designs novel nonparametric Bellman mappings in reproducing kernel Hilbert spaces (RKHSs) for reinforcement learning (RL)
The proposed mappings benefit from the rich approximating properties of RKHSs, adopt no assumptions on the statistics of the data owing to their nonparametric nature, and may operate without any training data.
As an application, the proposed mappings are employed to offer a novel solution to the problem of countering outliers in adaptive filtering.
- Score: 3.730504020733928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper designs novel nonparametric Bellman mappings in reproducing kernel Hilbert spaces (RKHSs) for reinforcement learning (RL). The proposed mappings benefit from the rich approximating properties of RKHSs, adopt no assumptions on the statistics of the data owing to their nonparametric nature, require no knowledge on transition probabilities of Markov decision processes, and may operate without any training data. Moreover, they allow for sampling on-the-fly via the design of trajectory samples, re-use past test data via experience replay, effect dimensionality reduction by random Fourier features, and enable computationally lightweight operations to fit into efficient online or time-adaptive learning. The paper offers also a variational framework to design the free parameters of the proposed Bellman mappings, and shows that appropriate choices of those parameters yield several popular Bellman-mapping designs. As an application, the proposed mappings are employed to offer a novel solution to the problem of countering outliers in adaptive filtering. More specifically, with no prior information on the statistics of the outliers and no training data, a policy-iteration algorithm is introduced to select online, per time instance, the ``optimal'' coefficient p in the least-mean-p-power-error method. Numerical tests on synthetic data showcase, in most of the cases, the superior performance of the proposed solution over several RL and non-RL schemes.
Related papers
- Proximal Bellman mappings for reinforcement learning and their
application to robust adaptive filtering [4.140907550856865]
This paper introduces the novel class of Bellman mappings.
The mappings are defined in reproducing kernel Hilbert spaces.
An approximate policy-iteration scheme is built on the proposed class of mappings.
arXiv Detail & Related papers (2023-09-14T09:20:21Z) - Low-rank extended Kalman filtering for online learning of neural
networks from streaming data [71.97861600347959]
We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream.
The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior matrix.
In contrast to methods based on variational inference, our method is fully deterministic, and does not require step-size tuning.
arXiv Detail & Related papers (2023-05-31T03:48:49Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - online and lightweight kernel-based approximated policy iteration for
dynamic p-norm linear adaptive filtering [8.319127681936815]
This paper introduces a solution to the problem of selecting dynamically (online) the optimal'' p-norm to combat outliers in linear adaptive filtering.
The proposed framework is built on kernel-based reinforcement learning (KBRL)
arXiv Detail & Related papers (2022-10-21T06:29:01Z) - Dynamic selection of p-norm in linear adaptive filtering via online
kernel-based reinforcement learning [8.319127681936815]
This study addresses the problem of selecting dynamically, at each time instance, the optimal'' p-norm to combat outliers in linear adaptive filtering.
Online and data-driven framework is designed via kernel-based reinforcement learning (KBRL)
arXiv Detail & Related papers (2022-10-20T14:49:39Z) - A Provably Efficient Model-Free Posterior Sampling Method for Episodic
Reinforcement Learning [50.910152564914405]
Existing posterior sampling methods for reinforcement learning are limited by being model-based or lack worst-case theoretical guarantees beyond linear MDPs.
This paper proposes a new model-free formulation of posterior sampling that applies to more general episodic reinforcement learning problems with theoretical guarantees.
arXiv Detail & Related papers (2022-08-23T12:21:01Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Adaptive Client Sampling in Federated Learning via Online Learning with
Bandit Feedback [36.05851452151107]
federated learning (FL) systems need to sample a subset of clients that are involved in each round of training.
Despite its importance, there is limited work on how to sample clients effectively.
We show how our sampling method can improve the convergence speed of optimization algorithms.
arXiv Detail & Related papers (2021-12-28T23:50:52Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - A Heuristic for Dynamic Output Predictive Control Design for Uncertain
Nonlinear Systems [0.0]
An efficient construction of the learning data set is proposed in which each solution provides many samples in the learning data.
The proposed solution recovers up to 78% of the expected advantage of having a perfect knowledge of the parameters compared to nominal design.
arXiv Detail & Related papers (2021-02-03T20:01:25Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.