Related papers: Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning

Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning

URL: http://arxiv.org/abs/2503.16192v2
Date: Wed, 08 Oct 2025 02:54:52 GMT
Title: Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning
Authors: Yuki Akiyama, Konstantinos Slavakis,
Abstract summary: This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL)<n>Each agent constructs a nonparametric B-Map from its private data, operating on Q-functions represented in a reproducing kernel Hilbert space.<n>A detailed performance analysis shows that the proposed DRL framework effectively approximates the performance of a centralized node.
Score: 8.324857108715007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL), where agents are deployed over an undirected, connected graph/network with arbitrary topology -- but without a centralized node, that is, a node capable of aggregating all data and performing computations. Each agent constructs a nonparametric B-Map from its private data, operating on Q-functions represented in a reproducing kernel Hilbert space, with flexibility in choosing the basis for their representation. Agents exchange their Q-function estimates only with direct neighbors, and unlike existing DRL approaches that restrict communication to Q-functions, the proposed framework also enables the transmission of basis information in the form of covariance matrices, thereby conveying additional structural details. Linear convergence rates are established for both Q-function and covariance-matrix estimates toward their consensus values, regardless of the network topology, with optimal learning rates determined by the ratio of the smallest positive eigenvalue (the graph's Fiedler value) to the largest eigenvalue of the graph Laplacian matrix. A detailed performance analysis further shows that the proposed DRL framework effectively approximates the performance of a centralized node, had such a node existed. Numerical tests on two benchmark control problems confirm the effectiveness of the proposed nonparametric B-Maps relative to prior methods. Notably, the tests reveal a counter-intuitive outcome: although the framework involves richer information exchange -- specifically through transmitting covariance matrices as basis information -- it achieves the desired performance at a lower cumulative communication cost than existing DRL schemes, underscoring the critical role of sharing basis information in accelerating the learning process.

Related papers

Amortized Spectral Kernel Discovery via Prior-Data Fitted Network [0.0]
We introduce an interpretability-driven framework for amortized spectral discovery from pre-trained PFNs with decoupled attention.<n>We propose decoder architectures that map PFN latents to explicit spectral density estimates and corresponding stationary kernels.<n>This yields orders-of-magnitude reductions in inference time compared to optimization-based baselines.
arXiv Detail & Related papers (2026-01-29T13:51:26Z)
Evaluating the Efficiency of Latent Spaces via the Coupling-Matrix [0.5013248430919224]
We introduce a redundancy index, denoted rho(C), that directly quantifies inter-dimensional dependencies.<n>Low rho(C) reliably predicts high classification accuracy or low reconstruction error, while elevated redundancy is associated with performance collapse.<n>We show that Tree-structured Parzen Estimators (TPE) preferentially explore low-rho regions, suggesting that rho(C) can guide neural architecture search and serve as a redundancy-aware regularization target.
arXiv Detail & Related papers (2025-09-08T03:36:47Z)
Communication-Efficient Personalized Distributed Learning with Data and Node Heterogeneity [40.64395367773766]
We propose a distributed strong lottery ticket hypothesis (TH), based on which a communication-efficient personalized learning algorithm is developed.<n>In the proposed method, each local model is represented as the Hadamard product of global real-valued parameters and a personalized binary mask for pruning.<n>We provide a theoretical proof for the DSLTH, establishing it as the foundation of the proposed method.
arXiv Detail & Related papers (2025-04-24T13:02:54Z)
Covariates-Adjusted Mixed-Membership Estimation: A Novel Network Model with Optimal Guarantees [3.6936359356095454]
This paper addresses the problem of estimation in networks, where the goal is to efficiently estimate the latent mixed-membership structure from the network. We propose a novel model that incorporates both information, and similarities to the node co-membership model. We show that our approach achieves optimal accuracy for both the similarity matrix and the Frobenius norm entry loss.
arXiv Detail & Related papers (2025-02-10T16:56:00Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Learning Compact Channel Correlation Representation for LiDAR Place Recognition [4.358456799125694]
We present a novel approach to learn compact channel correlation representation for LiDAR place recognition, called C3R. Our method partitions the feature matrix into smaller groups, computes group-wise covariance matrices, and aggregates them via a learnable aggregation strategy. We conduct extensive experiments on four large-scale, public LiDAR place recognition datasets to validate our approach's superiority in accuracy, and robustness.
arXiv Detail & Related papers (2024-09-24T09:40:22Z)
Robust Second-order LiDAR Bundle Adjustment Algorithm Using Mean Squared Group Metric [5.153195958837083]
We propose a novel mean square group metric (MSGM) to build the optimization objective in the LiDAR BA algorithm. By integrating a robust kernel function, the metrics involved in the BA algorithm are reweighted, and thus enhancing the robustness of the solution process.
arXiv Detail & Related papers (2024-09-03T12:53:39Z)
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z)
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning [39.090104460303415]
offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies.<n>This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data.<n>We introduce textbfBilintextbfEar textbfCAUSal rtextbfEpresentation(BECAUSE), an algorithm to capture causal representation for both states.
arXiv Detail & Related papers (2024-07-15T17:59:23Z)
Redefining the Shortest Path Problem Formulation of the Linear Non-Gaussian Acyclic Model: Pairwise Likelihood Ratios, Prior Knowledge, and Path Enumeration [1.5178009359320295]
The paper proposes a threefold enhancement to the LiNGAM-SPP framework.<n>The need for parameter tuning is eliminated by using the pairwise likelihood ratio in lieu of kNN-based mutual information.<n>The incorporation of prior knowledge is then enabled by a node-skipping strategy implemented on the graph representation of all causal orderings.
arXiv Detail & Related papers (2024-04-18T05:59:28Z)
Rethinking Clustered Federated Learning in NOMA Enhanced Wireless Networks [60.09912912343705]
This study explores the benefits of integrating the novel clustered federated learning (CFL) approach with non-independent and identically distributed (non-IID) datasets. A detailed theoretical analysis of the generalization gap that measures the degree of non-IID in the data distribution is presented. Solutions to address the challenges posed by non-IID conditions are proposed with the analysis of the properties.
arXiv Detail & Related papers (2024-03-05T17:49:09Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Distributed Variational Inference for Online Supervised Learning [15.038649101409804]
This paper develops a scalable distributed probabilistic inference algorithm. It applies to continuous variables, intractable posteriors and large-scale real-time data in sensor networks.
arXiv Detail & Related papers (2023-09-05T22:33:02Z)
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator. This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z)
Distributed Learning over Networks with Graph-Attention-Based Personalization [49.90052709285814]
We propose a graph-based personalized algorithm (GATTA) for distributed deep learning. In particular, the personalized model in each agent is composed of a global part and a node-specific part. By treating each agent as one node in a graph the node-specific parameters as its features, the benefits of the graph attention mechanism can be inherited.
arXiv Detail & Related papers (2023-05-22T13:48:30Z)
Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem. We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z)
IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction [73.25645602768158]
IPCC-TP is a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling. Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders.
arXiv Detail & Related papers (2023-03-01T15:16:56Z)
On Centralized and Distributed Mirror Descent: Exponential Convergence Analysis Using Quadratic Constraints [8.336315962271396]
Mirror descent (MD) is a powerful first-order optimization technique that subsumes several algorithms including gradient descent (GD) We study the exact convergence rate of MD in both centralized and distributed cases for strongly convex and smooth problems.
arXiv Detail & Related papers (2021-05-29T23:05:56Z)
Asymmetric Correlation Quantization Hashing for Cross-modal Retrieval [11.988383965639954]
Cross-modal hashing methods have attracted extensive attention in similarity retrieval across the heterogeneous modalities. ACQH is a novel Asymmetric Correlation Quantization Hashing (ACQH) method proposed in this paper. It learns the projection matrixs of heterogeneous modalities data points for transforming query into a low-dimensional real-valued vector in latent semantic space. It constructs the stacked compositional quantization embedding in a coarse-to-fine manner for indicating database points by a series of learnt real-valued codeword.
arXiv Detail & Related papers (2020-01-14T04:53:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.