Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning
- URL: http://arxiv.org/abs/2503.16192v1
- Date: Thu, 20 Mar 2025 14:39:21 GMT
- Title: Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning
- Authors: Yuki Akiyama, Konstantinos Slavakis,
- Abstract summary: This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL)<n>B-Maps operate on Q-functions represented in a kernel Hilbert space, enabling a nonparametric formulation.<n> Numerical experiments on two well-known control problems demonstrate the superior performance of the proposed nonparametric B-Maps.
- Score: 3.5051814539447474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL), where multiple agents operate over a network without a centralized fusion node. Each agent constructs its own nonparametric B-Map for VI while communicating only with direct neighbors to achieve consensus. These B-Maps operate on Q-functions represented in a reproducing kernel Hilbert space, enabling a nonparametric formulation that allows for flexible, agent-specific basis function design. Unlike existing DRL methods that restrict information exchange to Q-function estimates, the proposed framework also enables agents to share basis information in the form of covariance matrices, capturing additional structural details. A theoretical analysis establishes linear convergence rates for both Q-function and covariance-matrix estimates toward their consensus values. The optimal learning rates for consensus-based updates are dictated by the ratio of the smallest positive eigenvalue to the largest one of the network's Laplacian matrix. Furthermore, each nodal Q-function estimate is shown to lie very close to the fixed point of a centralized nonparametric B-Map, effectively allowing the proposed DRL design to approximate the performance of a centralized fusion center. Numerical experiments on two well-known control problems demonstrate the superior performance of the proposed nonparametric B-Maps compared to prior methods. Notably, the results reveal a counter-intuitive finding: although the proposed approach involves greater information exchange -- specifically through the sharing of covariance matrices -- it achieves the desired performance with lower cumulative communication cost than existing DRL schemes, highlighting the crucial role of basis information in accelerating the learning process.
Related papers
- Covariates-Adjusted Mixed-Membership Estimation: A Novel Network Model with Optimal Guarantees [3.6936359356095454]
This paper addresses the problem of estimation in networks, where the goal is to efficiently estimate the latent mixed-membership structure from the network.
We propose a novel model that incorporates both information, and similarities to the node co-membership model.
We show that our approach achieves optimal accuracy for both the similarity matrix and the Frobenius norm entry loss.
arXiv Detail & Related papers (2025-02-10T16:56:00Z) - Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Learning Compact Channel Correlation Representation for LiDAR Place Recognition [4.358456799125694]
We present a novel approach to learn compact channel correlation representation for LiDAR place recognition, called C3R.
Our method partitions the feature matrix into smaller groups, computes group-wise covariance matrices, and aggregates them via a learnable aggregation strategy.
We conduct extensive experiments on four large-scale, public LiDAR place recognition datasets to validate our approach's superiority in accuracy, and robustness.
arXiv Detail & Related papers (2024-09-24T09:40:22Z) - Robust Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric [5.153195958837083]
We propose a novel mean square group metric (MSGM) to build the optimization objective in the LiDAR BA algorithm.
By integrating a robust kernel function, the metrics involved in the BA algorithm are reweighted, and thus enhancing the robustness of the solution process.
arXiv Detail & Related papers (2024-09-03T12:53:39Z) - Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z) - Rethinking Clustered Federated Learning in NOMA Enhanced Wireless
Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets.
A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented.
Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Distributed Variational Inference for Online Supervised Learning [15.038649101409804]
This paper develops a scalable distributed probabilistic inference algorithm.
It applies to continuous variables, intractable posteriors and large-scale real-time data in sensor networks.
arXiv Detail & Related papers (2023-09-05T22:33:02Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint
Multi-Agent Trajectory Prediction [73.25645602768158]
IPCC-TP is a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling.
Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders.
arXiv Detail & Related papers (2023-03-01T15:16:56Z) - On Centralized and Distributed Mirror Descent: Exponential Convergence
Analysis Using Quadratic Constraints [8.336315962271396]
Mirror descent (MD) is a powerful first-order optimization technique that subsumes several algorithms including gradient descent (GD)
We study the exact convergence rate of MD in both centralized and distributed cases for strongly convex and smooth problems.
arXiv Detail & Related papers (2021-05-29T23:05:56Z) - Asymmetric Correlation Quantization Hashing for Cross-modal Retrieval [11.988383965639954]
Cross-modal hashing methods have attracted extensive attention in similarity retrieval across the heterogeneous modalities.
ACQH is a novel Asymmetric Correlation Quantization Hashing (ACQH) method proposed in this paper.
It learns the projection matrixs of heterogeneous modalities data points for transforming query into a low-dimensional real-valued vector in latent semantic space.
It constructs the stacked compositional quantization embedding in a coarse-to-fine manner for indicating database points by a series of learnt real-valued codeword.
arXiv Detail & Related papers (2020-01-14T04:53:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.