Related papers: PluralLLM: Pluralistic Alignment in LLMs via Federated Learning

PluralLLM: Pluralistic Alignment in LLMs via Federated Learning

URL: http://arxiv.org/abs/2503.09925v1
Date: Thu, 13 Mar 2025 00:45:27 GMT
Title: PluralLLM: Pluralistic Alignment in LLMs via Federated Learning
Authors: Mahmoud Srewa, Tianyu Zhao, Salma Elmalaki,
Abstract summary: We introduce PluralLLM, a federated learning-based approach that enables multiple user groups to collaboratively train a transformer-based preference predictor without sharing sensitive data.<n>Our method leverages Federated Averaging (FedAvg) to aggregate preference updates efficiently, achieving 46% faster convergence, a 4% improvement in alignment scores, and nearly the same group fairness measure as in centralized training.
Score: 7.752864126266439
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring Large Language Models (LLMs) align with diverse human preferences while preserving privacy and fairness remains a challenge. Existing methods, such as Reinforcement Learning from Human Feedback (RLHF), rely on centralized data collection, making them computationally expensive and privacy-invasive. We introduce PluralLLM a federated learning-based approach that enables multiple user groups to collaboratively train a transformer-based preference predictor without sharing sensitive data, which can also serve as a reward model for aligning LLMs. Our method leverages Federated Averaging (FedAvg) to aggregate preference updates efficiently, achieving 46% faster convergence, a 4% improvement in alignment scores, and nearly the same group fairness measure as in centralized training. Evaluated on a Q/A preference alignment task, PluralLLM demonstrates that federated preference learning offers a scalable and privacy-preserving alternative for aligning LLMs with diverse human values.

Related papers

Adaptive Cluster Collaborativeness Boosts LLMs Medical Decision Support Capacity [24.722167779987814]
Large language models (LLMs) have proven effective in natural language processing systems.<n>We propose an adaptive cluster collaborativeness methodology involving self-diversity and cross-consistency mechanisms.<n>Our method achieves the accuracy rate up to the publicly official passing score across all disciplines.
arXiv Detail & Related papers (2025-07-25T04:21:16Z)
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment [24.419502686973495]
We introduce a flexible paradigm for individual preference alignment. We validate our approach across multiple text generation tasks and demonstrate that it can produce aligned quality as well as or better than PEFT-based methods.
arXiv Detail & Related papers (2024-12-30T09:58:31Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Aligning LLMs with Individual Preferences via Interaction [51.72200436159636]
We train large language models (LLMs) that can ''interact to align''<n>We develop a multi-turn preference dataset containing 3K+ multi-turn conversations in tree structures.<n>For evaluation, we establish the ALOE benchmark, consisting of 100 carefully selected examples and well-designed metrics to measure the customized alignment performance during conversations.
arXiv Detail & Related papers (2024-10-04T17:48:29Z)
Towards Federated RLHF with Aggregated Client Preference for LLMs [16.97734775088073]
Reinforcement learning with human feedback (RLHF) fine-tunes a pretrained large language model (LLM) using user preference data. Due to privacy concerns, users may be reluctant to share sensitive preference data. We propose utilizing Federated Learning (FL) techniques, allowing large-scale preference collection from diverse real-world users.
arXiv Detail & Related papers (2024-07-03T12:02:24Z)
Pareto-Optimal Learning from Preferences with Hidden Context [17.590330740964266]
We propose POPL, which enables pluralistic alignment by framing discrepant group preferences as objectives with potential trade-offs.<n>Our theoretical and empirical evaluations demonstrate that POPL surpasses baseline methods in learning sets of reward functions and policies.<n>We illustrate that POPL can also serve as a foundation for techniques optimizing specific notions of group fairness.
arXiv Detail & Related papers (2024-06-21T18:57:38Z)
MaxMin-RLHF: Alignment with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.<n>We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.<n>Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z)
On Diversified Preferences of Large Language Model Alignment [51.26149027399505]
This paper presents the first quantitative analysis of the experimental scaling law for reward models with varying sizes. Our analysis reveals that the impact of diversified human preferences depends on both model size and data size. Larger models with sufficient capacity mitigate the negative effects of diverse preferences, while smaller models struggle to accommodate them.
arXiv Detail & Related papers (2023-12-12T16:17:15Z)
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging [148.77027765872006]
We study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem. LLMs are aligned to multiple preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem. We show that we can achieve personalized alignment by decomposing preferences into multiple dimensions.
arXiv Detail & Related papers (2023-10-17T20:22:13Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
FedPC: Federated Learning for Language Generation with Personal and Context Preference Embeddings [10.235620939242505]
Federated learning is a training paradigm that learns from multiple distributed users without aggregating data on a centralized server. We propose a new direction for personalization research within federated learning, leveraging both personal embeddings and shared context embeddings. We present an approach to predict these preference'' embeddings, enabling personalization without backpropagation.
arXiv Detail & Related papers (2022-10-07T18:01:19Z)
Multi-Center Federated Learning [62.57229809407692]
This paper proposes a novel multi-center aggregation mechanism for federated learning. It learns multiple global models from the non-IID user data and simultaneously derives the optimal matching between users and centers. Our experimental results on benchmark datasets show that our method outperforms several popular federated learning methods.
arXiv Detail & Related papers (2020-05-03T09:14:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.