Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems
- URL: http://arxiv.org/abs/2510.21427v1
- Date: Fri, 24 Oct 2025 13:06:43 GMT
- Title: Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems
- Authors: Hao Liang, Shuqing Shi, Yudi Zhang, Biwei Huang, Yali Du,
- Abstract summary: Large-scale networked systems, such as traffic, power, and wireless grids, challenge reinforcement-learning agents with both scale and environment shifts.<n>We propose GSAC, a framework that couples causal representation learning with meta actor-critic learning to achieve both scalability and domain generalization.<n>We show that GSAC adapts rapidly and significantly outperforms learning-from-scratch and conventional adaptation baselines.
- Score: 26.67939638191807
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale networked systems, such as traffic, power, and wireless grids, challenge reinforcement-learning agents with both scale and environment shifts. To address these challenges, we propose GSAC (Generalizable and Scalable Actor-Critic), a framework that couples causal representation learning with meta actor-critic learning to achieve both scalability and domain generalization. Each agent first learns a sparse local causal mask that provably identifies the minimal neighborhood variables influencing its dynamics, yielding exponentially tight approximately compact representations (ACRs) of state and domain factors. These ACRs bound the error of truncating value functions to $\kappa$-hop neighborhoods, enabling efficient learning on graphs. A meta actor-critic then trains a shared policy across multiple source domains while conditioning on the compact domain factors; at test time, a few trajectories suffice to estimate the new domain factor and deploy the adapted policy. We establish finite-sample guarantees on causal recovery, actor-critic convergence, and adaptation gap, and show that GSAC adapts rapidly and significantly outperforms learning-from-scratch and conventional adaptation baselines.
Related papers
- Constrained Adversarial Perturbation [16.05659740749269]
Universal Adversarial Perturbations (UAPs) have emerged as a powerful tool for both stress testing model robustness and scalable adversarial training.<n>We propose Constrained Adversarial Perturbation (CAP), an efficient algorithm that solves this problem using a gradient based alternating optimization strategy.
arXiv Detail & Related papers (2025-10-17T14:44:20Z) - Bayesian Ego-graph inference for Networked Multi-Agent Reinforcement Learning [16.190458233440864]
We propose a graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood.<n>We introduce BayesG, a decentralized actor-framework that learns sparse, context-aware interaction structures via Bayesian variational inference.<n>BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents.
arXiv Detail & Related papers (2025-09-20T10:09:37Z) - DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data [65.09939942413651]
We propose a principled extension to GRPO that addresses inter-group imbalance with two key innovations.<n> Domain-aware reward scaling counteracts frequency bias by reweighting optimization based on domain prevalence.<n>Difficulty-aware reward scaling leverages prompt-level self-consistency to identify and prioritize uncertain prompts that offer greater learning value.
arXiv Detail & Related papers (2025-05-21T03:43:29Z) - Hierarchical Local-Global Feature Learning for Few-shot Malicious Traffic Detection [6.118242543398087]
Malicious network attacks have become increasingly frequent and sophisticated.<n>Traditional detection methods, including rule-based and machine learning-based approaches, struggle to accurately identify emerging threats.<n>We propose HLoG, a novel hierarchical few-shot malicious traffic detection framework.
arXiv Detail & Related papers (2025-04-01T14:56:44Z) - Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations [22.6449779859417]
General intelligence requires quick adaption across tasks.<n>In this paper, we explore a wider range of scenarios where not only the distribution but also the environment spaces may change.<n>We introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively.
arXiv Detail & Related papers (2024-07-30T08:48:49Z) - Cross-Domain Continual Learning via CLAMP [10.553456651003055]
CLAMP significantly outperforms established baseline algorithms across all experiments by at least $10%$ margin.
An assessor-guided learning process is put forward to navigate the learning process of a base model.
arXiv Detail & Related papers (2024-05-12T02:41:31Z) - Deep face recognition with clustering based domain adaptation [57.29464116557734]
We propose a new clustering-based domain adaptation method designed for face recognition task in which the source and target domain do not share any classes.
Our method effectively learns the discriminative target feature by aligning the feature domain globally, and, at the meantime, distinguishing the target clusters locally.
arXiv Detail & Related papers (2022-05-27T12:29:11Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Locality Matters: A Scalable Value Decomposition Approach for
Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents.
We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z) - Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring
Network [58.05473757538834]
This paper proposes a novel adversarial scoring network (ASNet) to bridge the gap across domains from coarse to fine granularity.
Three sets of migration experiments show that the proposed methods achieve state-of-the-art counting performance.
arXiv Detail & Related papers (2021-07-27T14:47:24Z) - Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal
Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification.
Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.