Pacing Opinion Polarization via Graph Reinforcement Learning
- URL: http://arxiv.org/abs/2602.23390v1
- Date: Mon, 23 Feb 2026 11:57:04 GMT
- Title: Pacing Opinion Polarization via Graph Reinforcement Learning
- Authors: Mingkai Liao,
- Abstract summary: PACIFIER is a graph reinforcement learning framework for sequential polarization moderation via network interventions.<n>It reformulates canonical problems as sequential decision-making tasks, enabling adaptive intervention policies without repeated steady-state recomputation.<n>Experiments on real-world networks demonstrate strong performance and scalability across diverse moderation scenarios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Opinion polarization in online social networks poses serious risks to social cohesion and democratic processes. Recent studies formulate polarization moderation as algorithmic intervention problems under opinion dynamics models, especially the Friedkin--Johnsen (FJ) model. However, most existing methods are tailored to specific linear settings and rely on closed-form steady-state analysis, limiting scalability, flexibility, and applicability to cost-aware, nonlinear, or topology-altering interventions. We propose PACIFIER, a graph reinforcement learning framework for sequential polarization moderation via network interventions. PACIFIER reformulates the canonical ModerateInternal (MI) and ModerateExpressed (ME) problems as sequential decision-making tasks, enabling adaptive intervention policies without repeated steady-state recomputation. The framework is objective-agnostic and extends naturally to FJ-consistent settings, including budget-aware interventions, continuous internal opinions, biased-assimilation dynamics, and node removal. Extensive experiments on real-world networks demonstrate strong performance and scalability across diverse moderation scenarios.
Related papers
- On the Plasticity and Stability for Post-Training Large Language Models [54.757672540381236]
We identify a root cause as the conflict between plasticity and stability gradients.<n>We propose Probabilistic Conflict Resolution (PCR), a framework that models gradients as random variables.<n>PCR significantly smooths the training trajectory and achieves superior performance in various reasoning tasks.
arXiv Detail & Related papers (2026-02-06T07:31:26Z) - Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - Balance Equation-based Distributionally Robust Offline Imitation Learning [8.607736795429638]
Imitation Learning (IL) has proven highly effective for robotic and control tasks where manually designing reward functions or explicit controllers is infeasible.<n>Standard IL methods implicitly assume that the environment dynamics remain fixed between training and deployment.<n>We address this challenge through Balance Equation-based Distributionally Robust Offline Learning.<n>We formulate the problem as a distributionally robust optimization over an uncertainty set of transition models, seeking a policy that minimizes the imitation loss under the worst-case transition distribution.
arXiv Detail & Related papers (2025-11-11T07:48:09Z) - Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z) - On the System Theoretic Offline Learning of Continuous-Time LQR with Exogenous Disturbances [3.701656361145375]
We analyze offline designs of linear quadratic regulator (LQR) strategies with uncertain disturbances.<n>Our approach builds on the fundamental learning-based framework of adaptive dynamic programming.
arXiv Detail & Related papers (2025-09-20T17:14:27Z) - POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes [15.681058679765277]
We propose POLAR, a pessimistic model-based policy learning algorithm for offline dynamic treatment regimes (DTRs)<n> POLAR estimates the transition dynamics from offline data and quantifies uncertainty for each history-action pair.<n>Unlike many existing methods that focus on average training performance, POLAR directly targets the suboptimality of the final learned policy and offers theoretical guarantees.<n> Empirical results on both synthetic data and the MIMIC-III dataset demonstrate that POLAR outperforms state-of-the-art methods and yields near-optimal, history-aware treatment strategies.
arXiv Detail & Related papers (2025-06-25T13:22:57Z) - Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [53.03951222945921]
We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - Online Nonstochastic Model-Free Reinforcement Learning [35.377261344335736]
We investigate robust model robustness guarantees for environments that may be dynamic or adversarial.
We provide efficient and efficient algorithms for optimizing these policies.
These are the best-known developments in having no dependence on the state-space dimension in having no dependence on the state-space.
arXiv Detail & Related papers (2023-05-27T19:02:55Z) - DROMO: Distributionally Robust Offline Model-based Policy Optimization [0.0]
We consider the problem of offline reinforcement learning with model-based control.
We propose distributionally robust offline model-based policy optimization (DROMO)
arXiv Detail & Related papers (2021-09-15T13:25:14Z) - Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data.
We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.