Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach
- URL: http://arxiv.org/abs/2601.18952v1
- Date: Mon, 26 Jan 2026 20:46:00 GMT
- Title: Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach
- Authors: Mehrdad Mohammadi, Qi Zheng, Ruoqing Zhu,
- Abstract summary: We propose an (offline) multi-dimensional distributional reinforcement learning framework (KE-DRL)<n>We estimate the kernel mean embedding of the multi-dimensional value distribution under a proposed target policy using Hilbert space mappings.<n> Simulations and empirical results demonstrate robust off-policy evaluation and recovery of the kernel mean embedding.
- Score: 5.7161009858370875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an (offline) multi-dimensional distributional reinforcement learning framework (KE-DRL) that leverages Hilbert space mappings to estimate the kernel mean embedding of the multi-dimensional value distribution under a proposed target policy. In our setting, the state-action variables are multi-dimensional and continuous. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with an integral probability metric. This enables efficient estimation in multi-dimensional state-action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric involving the Matern family of kernels and provide uniform convergence guarantees. Simulations and empirical results demonstrate robust off-policy evaluation and recovery of the kernel mean embedding under mild assumptions, namely, Lipschitz continuity and boundedness of the kernels, highlighting the potential of embedding-based approaches in complex real-world decision-making scenarios and risk evaluation.
Related papers
- Notes on Kernel Methods in Machine Learning [0.8435614464136675]
We develop the theory of positive definite kernels, reproducing kernel Hilbert spaces (RKHS), and Hilbert-Schmidt operators.<n>We also introduce kernel density estimation, kernel embeddings of distributions, and the Maximum Mean Discrepancy (MMD)<n>The exposition is designed to serve as a foundation for more advanced topics, including Gaussian processes, kernel Bayesian inference, and functional analytic approaches to modern machine learning.
arXiv Detail & Related papers (2025-11-18T13:29:07Z) - Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective [73.18641268511318]
We propose a graph-based clustering algorithm that only relaxes the orthonormal constraint to derive clustering results.<n>To ensure a doubly constraint into a gradient, we transform the non-negative constraint into a class probability parameter.
arXiv Detail & Related papers (2025-09-23T09:14:39Z) - Bounds in Wasserstein Distance for Locally Stationary Processes [0.29771206318712146]
We introduce a novel conditional probability distribution estimator specifically tailored for locally stationary (LSP) data.<n>We rigorously establish convergence rates for the NW-based conditional probability estimator under the Wasserstein metric.<n>We conduct extensive numerical simulations on synthetic datasets and provide empirical validations using real-world data.
arXiv Detail & Related papers (2024-12-04T15:51:22Z) - On Policy Evaluation Algorithms in Distributional Reinforcement Learning [0.0]
We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL)
For a plain instance of our proposed class of algorithms we prove error bounds, both within Wasserstein and Kolmogorov--Smirnov distances.
For return distributions having probability density functions the algorithms yield approximations for these densities; error bounds are given within supremum norm.
arXiv Detail & Related papers (2024-07-19T10:06:01Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Safe Reinforcement Learning in Tensor Reproducing Kernel Hilbert Space [9.823296458696882]
In traditional partially observable Markov decision processes, ensuring safety typically involves estimating the belief in latent states.
We propose a model-based approach that guarantees RL safety almost surely in the face of unknown system dynamics.
arXiv Detail & Related papers (2023-12-01T17:01:37Z) - Distribution Regression with Sliced Wasserstein Kernels [45.916342378789174]
We propose the first OT-based estimator for distribution regression.
We study the theoretical properties of a kernel ridge regression estimator based on such representation.
arXiv Detail & Related papers (2022-02-08T15:21:56Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.