Related papers: FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models

FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models

URL: http://arxiv.org/abs/2508.10020v1
Date: Thu, 07 Aug 2025 06:50:15 GMT
Title: FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models
Authors: Chuan Li, Qianyi Zhao, Fengran Mo, Cen Chen,
Abstract summary: FedCoT is a novel framework specifically designed to enhance reasoning in federated settings.<n>It improves reasoning accuracy and robustness while providing valuable interpretability, which is critical for medical applications.
Score: 14.173704018103454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficiently enhancing the reasoning capabilities of large language models (LLMs) in federated learning environments remains challenging, particularly when balancing performance gains with strict computational, communication, and privacy constraints. This challenge is especially acute in healthcare, where decisions-spanning clinical, operational, and patient-facing contexts-demand not only accurate outputs but also interpretable, traceable rationales to ensure safety, accountability, and regulatory compliance. Conventional federated tuning approaches on LLM fail to address this need: they optimize primarily for answer correctness while neglecting rationale quality, leaving CoT capabilities dependent on models' innate pre-training abilities. Moreover, existing methods for improving rationales typically rely on privacy-violating knowledge distillation from centralized models. Additionally, the communication overhead in traditional federated fine-tuning on LLMs remains substantial. We addresses this gap by proposing FedCoT, a novel framework specifically designed to enhance reasoning in federated settings. FedCoT leverages a lightweight chain-of-thought enhancement mechanism: local models generate multiple reasoning paths, and a compact discriminator dynamically selects the most promising one. This approach improves reasoning accuracy and robustness while providing valuable interpretability, which is particularly critical for medical applications. To manage client heterogeneity efficiently, we adopt an improved aggregation approach building upon advanced LoRA module stacking, incorporating client classifier-awareness to achieve noise-free aggregation across diverse clients. Comprehensive experiments on medical reasoning tasks demonstrate that FedCoT significantly boosts client-side reasoning performance under stringent resource budgets while fully preserving data privacy.

Related papers

Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning [31.629261193485053]
Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world tasks.<n>Most existing outcome-only RLVR pipelines rely almost exclusively on a binary correctness signal and largely ignore the model's intrinsic uncertainty.<n>We propose EGPO, a metacognitive entropy calibration framework that explicitly integrates intrinsic uncertainty into RLVR for enhancing LRMs.
arXiv Detail & Related papers (2026-02-26T08:40:06Z)
Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection [53.45696787935487]
Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes.<n>In real-world service-oriented deployments, data generated by heterogeneous users, devices, and application scenarios are inherently non-IID.<n>We propose FLood, a novel FL framework inspired by out-of-distribution (OOD) detection.
arXiv Detail & Related papers (2026-02-01T05:54:59Z)
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models [19.266791462067655]
We introduce FIRM (Federated In-client Regularized Multi-objective alignment), a novel algorithm that achieves both client disagreement drift mitigation and communication efficiency.<n>We show that FIRM leads to smoother training dynamics, reduced client disagreement drift, and improved reward trade-offs compared to baselines.
arXiv Detail & Related papers (2025-11-21T06:59:28Z)
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks [63.541114376141735]
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios.<n>However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks.<n>We propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism.
arXiv Detail & Related papers (2025-11-04T15:14:58Z)
Federated In-Context Learning: Iterative Refinement for Improved Answer Quality [62.72381208029899]
In-context learning (ICL) enables language models to generate responses without modifying their parameters by leveraging examples provided in the input.<n>We propose Federated In-Context Learning (Fed-ICL), a general framework that enhances ICL through an iterative, collaborative process.<n>Fed-ICL progressively refines responses by leveraging multi-round interactions between clients and a central server, improving answer quality without the need to transmit model parameters.
arXiv Detail & Related papers (2025-06-09T05:33:28Z)
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization [55.06360285372418]
Group Relative Policy Optimization is a reinforcement learning method for large reasoning models (LRMs)<n>In this work, we analyze the GRPO objective under a binary reward setting and reveal an inherent limitation of question-level difficulty bias.<n>We introduce a new Discriminative Constrained Optimization framework for reinforcing LRMs, grounded in the principle of discriminative learning.
arXiv Detail & Related papers (2025-05-18T11:08:32Z)
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling [48.15636223774418]
Large language models (LLMs) are prone to hallucination stemming from misaligned self-awareness.<n>We propose the Explicit Knowledge Boundary Modeling framework to integrate fast and slow reasoning systems to harmonize reliability and usability.
arXiv Detail & Related papers (2025-03-04T03:16:02Z)
Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.<n>This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
On the Convergence of Zeroth-Order Federated Tuning for Large Language Models [36.277423093218275]
Federated Learning and Large Language Models (LLMs) are ushering in a new era in privacy-preserving natural language processing. Memory-efficient Zeroth-Order Optimization is a synergy we term as FedMeZO. Our study is the first to examine the theoretical underpinnings of FedMeZO in the context of LLMs.
arXiv Detail & Related papers (2024-02-08T18:56:40Z)
FedAA: A Reinforcement Learning Perspective on Adaptive Aggregation for Fair and Robust Federated Learning [5.622065847054885]
Federated Learning (FL) has emerged as a promising approach for privacy-preserving model training across decentralized devices.<n>We introduce a novel method called textbfFedAA, which optimize client contributions via textbfAdaptive textbfAggregation to enhance model robustness against malicious clients.
arXiv Detail & Related papers (2024-02-08T10:22:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.