Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation
- URL: http://arxiv.org/abs/2603.02426v1
- Date: Mon, 02 Mar 2026 22:10:56 GMT
- Title: Personalized Multi-Agent Average Reward TD-Learning via Joint Linear Approximation
- Authors: Leo, Wang, Pengkun Yang, Lili Su,
- Abstract summary: We study personalized multi-agent average reward TD learning, in which a collection of agents interact with different environments.<n>We focus on the setting where there exists a shared linear representation, and the agents' optimal weights collectively lie in an unknown linear subspace.
- Score: 36.652579641421106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study personalized multi-agent average reward TD learning, in which a collection of agents interacts with different environments and jointly learns their respective value functions. We focus on the setting where there exists a shared linear representation, and the agents' optimal weights collectively lie in an unknown linear subspace. Inspired by the recent success of personalized federated learning (PFL), we study the convergence of cooperative single-timescale TD learning in which agents iteratively estimate the common subspace and local heads. We showed that this decomposition can filter out conflicting signals, effectively mitigating the negative impacts of ``misaligned'' signals, and achieving linear speedup. The main technical challenges lie in the heterogeneity, the Markovian sampling, and their intricate interplay in shaping error evolutions. Specifically, not only are the error dynamics of multiple variables closely interconnected, but there is also no direct contraction for the principal angle distance between the optimal subspace and the estimated subspace. We hope our analytical techniques can be useful to inspire research on deeper exploration into leveraging common structures. Experiments are provided to show the benefits of learning via a shared structure to the more general control problem.
Related papers
- Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency [52.52950138164424]
We show that when leveraging the off-the-shelf (vision) foundation models for feature extraction, the geometric shapes of the resulting feature distributions exhibit remarkable transferability across domains and datasets.<n>We embody our geometric knowledge-guided distribution calibration framework in two popular and challenging settings: federated learning and long-tailed recognition.<n>In long-tailed learning, it utilizes the geometric knowledge transferred from sample-rich categories to recover the true distribution for sample-scarce tail classes.
arXiv Detail & Related papers (2025-08-19T05:22:59Z) - Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis [55.13545823385091]
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents.<n>In real-world applications, each agent may experience slightly different transition dynamics, leading to inherent model mismatches.<n>We show that even moderate levels of information sharing significantly mitigate environment-specific errors.
arXiv Detail & Related papers (2025-03-21T18:06:28Z) - SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding [11.38008343729117]
Multi-Agent Path Finding problem is core challenge for robotic deployments in large-scale logistics and transportation.<n>We introduce a new framework that applies sheaf theory to decentralized deep reinforcement learning.<n>Our proposed method demonstrates significant improvements over state-of-the-art learning-based MAPF planners.
arXiv Detail & Related papers (2025-02-10T13:17:34Z) - Coordination Failure in Cooperative Offline MARL [3.623224034411137]
We focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data.
By using two-player games as an analytical tool, we demonstrate a simple yet overlooked failure mode of BRUD-based algorithms.
We propose an approach to mitigate such failure, by prioritising samples from the dataset based on joint-action similarity.
arXiv Detail & Related papers (2024-07-01T14:51:29Z) - The Curse of Diversity in Ensemble-Based Exploration [7.209197316045156]
Training a diverse ensemble of data-sharing agents can significantly impair the performance of the individual ensemble members.
We name this phenomenon the curse of diversity.
We demonstrate the potential of representation learning to counteract the curse of diversity.
arXiv Detail & Related papers (2024-05-07T14:14:50Z) - Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning [9.571499333904969]
Auxiliary tasks facilitate learning in situations where data is scarce or the principal task of interest is extremely complex.<n>We propose a novel framework, dubbed Detaux, whereby a weakly supervised disentanglement procedure is used to discover a new unrelated auxiliary classification task.<n>The disentanglement procedure works at the representation level, isolating the variation related to the principal task into an isolated subspace.
arXiv Detail & Related papers (2023-10-13T17:40:39Z) - Consistency and Diversity induced Human Motion Segmentation [231.36289425663702]
We propose a novel Consistency and Diversity induced human Motion (CDMS) algorithm.
Our model factorizes the source and target data into distinct multi-layer feature spaces.
A multi-mutual learning strategy is carried out to reduce the domain gap between the source and target data.
arXiv Detail & Related papers (2022-02-10T06:23:56Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Continual Learning in Low-rank Orthogonal Subspaces [86.36417214618575]
In continual learning (CL), a learner is faced with a sequence of tasks, arriving one after the other, and the goal is to remember all the tasks once the learning experience is finished.
The prior art in CL uses episodic memory, parameter regularization or network structures to reduce interference among tasks, but in the end, all the approaches learn different tasks in a joint vector space.
We propose to learn tasks in different (low-rank) vector subspaces that are kept orthogonal to each other in order to minimize interference.
arXiv Detail & Related papers (2020-10-22T12:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.