Related papers: FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Client

FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Client

URL: http://arxiv.org/abs/2602.12014v1
Date: Thu, 12 Feb 2026 14:45:56 GMT
Title: FedGRPO: Privately Optimizing Foundation Models with Group-Relative Rewards from Domain Client
Authors: Gongxi Zhu, Hanlin Gu, Lixin Fan, Qiang Yang, Yuxing Han,
Abstract summary: Existing methods based on model level or representation level knowledge transfer either require expensive local training or incur high communication costs.<n>We reformulate this problem as a reinforcement learning style evaluation process and propose FedGRPO.<n>FedGRPO achieves superior downstream accuracy and communication efficiency compared to conventional FedFMs baselines.
Score: 21.08829811371245
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One important direction of Federated Foundation Models (FedFMs) is leveraging data from small client models to enhance the performance of a large server-side foundation model. Existing methods based on model level or representation level knowledge transfer either require expensive local training or incur high communication costs and introduce unavoidable privacy risks. We reformulate this problem as a reinforcement learning style evaluation process and propose FedGRPO, a privacy preserving framework comprising two modules. The first module performs competence-based expert selection by building a lightweight confidence graph from auxiliary data to identify the most suitable clients for each question. The second module leverages the "Group Relative" concept from the Group Relative Policy Optimization (GRPO) framework by packaging each question together with its solution rationale into candidate policies, dispatching these policies to a selected subset of expert clients, and aggregating solely the resulting scalar reward signals via a federated group-relative loss function. By exchanging reward values instead of data or model updates, FedGRPO reduces privacy risk and communication overhead while enabling parallel evaluation across heterogeneous devices. Empirical results on diverse domain tasks demonstrate that FedGRPO achieves superior downstream accuracy and communication efficiency compared to conventional FedFMs baselines.

Related papers

FeDecider: An LLM-Based Framework for Federated Cross-Domain Recommendation [75.50721642765994]
Large language model (LLM)-based recommendation models have demonstrated impressive performance.<n>We propose an LLM-based framework for Federated cross-domain recommendation, FeDecider.<n>Extensive experiments across diverse datasets validate the effectiveness of our proposed FeDecider.
arXiv Detail & Related papers (2026-02-17T21:42:28Z)
Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models [63.70401095689976]
We argue that replacing parameters with preferences represents a more scalable and privacy-preserving future.<n>We propose MoR, a federated alignment framework based on GRPO with Mixture-of-Rewards for heterogeneous VLMs.<n>MoR consistently outperforms federated alignment baselines in generalization, robustness, and cross-client adaptability.
arXiv Detail & Related papers (2026-01-31T03:11:51Z)
Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework [57.04850867402913]
Federated clustering addresses the challenge of extracting patterns from decentralized, unlabeled data.<n>We propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing.<n>Our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10% (NMI) over federated baselines while maintaining provable privacy guarantees.
arXiv Detail & Related papers (2025-11-14T03:05:22Z)
STT-GS: Sample-Then-Transmit Edge Gaussian Splatting with Joint Client Selection and Power Control [77.56170394100022]
Edge Gaussian splatting (EGS) aggregates data from distributed clients and trains a global GS model at the edge server.<n>This paper formulates a novel GS-oriented objective function that distinguishes the view contributions of different clients.<n>It is found that the GS-oriented objective can be accurately predicted with low sampling ratios.
arXiv Detail & Related papers (2025-10-15T06:20:47Z)
PQFed: A Privacy-Preserving Quality-Controlled Federated Learning Framework [3.279539373700685]
Federated learning enables collaborative model training without sharing raw data.<n>PQFed is a privacy-preserving personalized federated learning framework.<n>PQFed consistently improves the target client's model performance, even with a limited number of participants.
arXiv Detail & Related papers (2025-09-25T23:56:24Z)
Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning [1.3270838622986498]
Federated learning (FL) enables collaborative model training across distributed clients while preserving data privacy by keeping data local.<n>Traditional FL approaches rely on a centralized, star-shaped topology, where a central server aggregates model updates from clients.<n>We propose a decentralized, peer-to-peer (P2P) FL framework to enable each client to identify and aggregate a personalized set of trustworthy and beneficial updates.
arXiv Detail & Related papers (2025-08-07T10:10:37Z)
DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data [65.09939942413651]
We propose a principled extension to GRPO that addresses inter-group imbalance with two key innovations.<n> Domain-aware reward scaling counteracts frequency bias by reweighting optimization based on domain prevalence.<n>Difficulty-aware reward scaling leverages prompt-level self-consistency to identify and prioritize uncertain prompts that offer greater learning value.
arXiv Detail & Related papers (2025-05-21T03:43:29Z)
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization [50.91849555841057]
Group Relative Policy Optimization is a reinforcement learning method for large reasoning models (LRMs)<n>We introduce a new Discriminative Constrained Optimization framework for reinforcing LRMs, grounded in the principle of discriminative learning.<n>DisCO significantly outperforms GRPO and its improved variants such as DAPO, achieving average gains of 7% over GRPO and 6% over DAPO.
arXiv Detail & Related papers (2025-05-18T11:08:32Z)
Client-Centric Federated Adaptive Optimization [78.30827455292827]
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private.<n>We propose Federated-Centric Adaptive Optimization, which is a class of novel federated optimization approaches.
arXiv Detail & Related papers (2025-01-17T04:00:50Z)
FedSpaLLM: Federated Pruning of Large Language Models [8.45879077052023]
Large Language Models (LLMs) achieve state-of-the-art performance but are challenging to deploy due to their high computational and storage demands.<n>We propose FedSpaLLM, the first federated learning framework designed specifically for pruning LLMs.
arXiv Detail & Related papers (2024-10-18T20:33:12Z)
FedSPLIT: One-Shot Federated Recommendation System Based on Non-negative Joint Matrix Factorization and Knowledge Distillation [7.621960305708476]
We present the first unsupervised one-shot federated CF implementation, named FedSPLIT, based on NMF joint factorization. FedSPLIT can obtain similar results than the state of the art (and even outperform it in certain situations) with a substantial decrease in the number of communications.
arXiv Detail & Related papers (2022-05-04T23:42:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.