Related papers: FOVA: Offline Federated Reinforcement Learning with Mixed-Quality Data

FOVA: Offline Federated Reinforcement Learning with Mixed-Quality Data

URL: http://arxiv.org/abs/2512.02350v1
Date: Tue, 02 Dec 2025 02:35:55 GMT
Title: FOVA: Offline Federated Reinforcement Learning with Mixed-Quality Data
Authors: Nan Qiao, Sheng Yue, Ju Ren, Yaoxue Zhang,
Abstract summary: offline Federated Reinforcement Learning (FRL)<n>This paper introduces a new vote-based offline FRL framework, named FOVA.<n>It exploits a emphvote mechanism to identify high-return actions during local policy evaluation, alleviating the negative effect of low-quality behaviors from diverse local learning policies.
Score: 22.024833953049313
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline Federated Reinforcement Learning (FRL), a marriage of federated learning and offline reinforcement learning, has attracted increasing interest recently. Albeit with some advancement, we find that the performance of most existing offline FRL methods drops dramatically when provided with mixed-quality data, that is, the logging behaviors (offline data) are collected by policies with varying qualities across clients. To overcome this limitation, this paper introduces a new vote-based offline FRL framework, named FOVA. It exploits a \emph{vote mechanism} to identify high-return actions during local policy evaluation, alleviating the negative effect of low-quality behaviors from diverse local learning policies. Besides, building on advantage-weighted regression (AWR), we construct consistent local and global training objectives, significantly enhancing the efficiency and stability of FOVA. Further, we conduct an extensive theoretical analysis and rigorously show that the policy learned by FOVA enjoys strict policy improvement over the behavioral policy. Extensive experiments corroborate the significant performance gains of our proposed algorithm over existing baselines on widely used benchmarks.

Related papers

Causal Flow Q-Learning for Robust Offline Reinforcement Learning [53.63254824501714]
We introduce a practical implementation that learns expressive flow-matching policies from confounded demonstrations.<n>Our proposed confounding-robust augmentation procedure achieves a success rate 120% that of confounding-unaware, state-of-the-art offline RL methods.
arXiv Detail & Related papers (2026-02-02T21:50:52Z)
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning [45.19254609437857]
Online reinforcement learning (RL) excels in complex, safety-critical domains but suffers from sample inefficiency, training instability, and limited interpretability.<n>Data attribution provides a principled way to trace model behavior back to training samples.<n>We propose an algorithm, iterative influence-based filtering (IIF), for online RL training that iteratively performs experience filtering to refine policy updates.
arXiv Detail & Related papers (2025-05-25T19:25:57Z)
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL [42.57662196581823]
Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks. Most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer. We present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy.
arXiv Detail & Related papers (2024-05-28T18:38:46Z)
Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows [30.926243761581624]
Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training. CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation. Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
arXiv Detail & Related papers (2024-05-06T22:44:32Z)
Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error. In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z)
Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation. We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z)
Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged. We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z)
Mutual Information Regularized Offline Reinforcement Learning [76.05299071490913]
We propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset. We show that optimizing this lower bound is equivalent to maximizing the likelihood of a one-step improved policy on the offline dataset. We introduce 3 different variants of MISA, and empirically demonstrate that tighter mutual information lower bound gives better offline RL performance.
arXiv Detail & Related papers (2022-10-14T03:22:43Z)
Data Valuation for Offline Reinforcement Learning [1.3535770763481902]
The field of offline reinforcement learning addresses issues through outsourcing the collection of data to a domain expert or a carefully monitored program. With the emergence of data markets, an alternative to constructing a dataset in-house is to purchase external data. This raises questions regarding the transferability and robustness of an offline reinforcement learning agent trained on externally acquired data.
arXiv Detail & Related papers (2022-05-19T13:21:40Z)
Dealing with the Unknown: Pessimistic Offline Reinforcement Learning [25.30634466168587]
We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset.
arXiv Detail & Related papers (2021-11-09T22:38:58Z)
Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z)
MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical. We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization. Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.