Fugu-MT 論文翻訳(概要): VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models

論文の概要: VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models

arxiv url: http://arxiv.org/abs/2603.18113v1
Date: Wed, 18 Mar 2026 14:05:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:05.775586
Title: VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
Title（参考訳）: VC-Soup: 大規模言語モデルのためのバリュー一貫性ガイド付き多値アライメント
Authors: Hefei Xu, Le Wu, Yu Wang, Min Hou, Han Wu, Zhen Zhang, Meng Wang,
Abstract要約: 本稿では,バリュー一貫性学習に基づくデータフィルタリングとパラメータ統合フレームワークであるVC-soupを提案する。本稿では,VC-Soupが競合を効果的に軽減し,既存の多値アライメント手法より一貫して優れていることを示す。
参考スコア（独自算出の注目度）: 26.480803729157945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when aligning multiple, potentially conflicting human values. Although recent approaches, such as reward reweighting, prompt-based supervised fine-tuning, and model merging, attempt to tackle multi-value alignment, they still face two major limitations: (1) training separate models for each value combination is prohibitively expensive; (2) value conflicts substantially degrade alignment performance. These limitations make it difficult to achieve favorable trade-offs across diverse human values. To address these challenges, we revisit multi-value alignment from the perspective of value consistency in data and propose VC-soup, a data filtering and parameter merging framework grounded in value-consistent learning. We first design a value consistency metric based on the cosine similarity between the reward-gap vector of each preference pair and an all-ones vector, which quantifies its cross-value coherence. We then filter out low-consistency preference pairs in each value dataset and train on the remaining data to obtain smooth, value-consistent policy models that better preserve linear mode connectivity. Finally, we linearly combine these policies and apply Pareto filtering across values to obtain solutions with balanced multi-value performance. Extensive experiments and theoretical analysis demonstrate that VC-soup effectively mitigates conflicts and consistently outperforms existing multi-value alignment methods.
Abstract（参考訳）: 大規模言語モデル(LLM)がますますWeb全体のコンテンツ生成、インタラクション、意思決定を形作るにつれて、それらを人間の価値と整合させることが、信頼できるAIにおいて中心的な目標となっている。この課題は、複数の、潜在的に矛盾する人間の価値を一致させるときにさらに顕著になる。報酬の重み付け、プロンプトベースの教師付き微調整、モデルマージといった最近のアプローチは、多値アライメントに挑戦しようとするが、それらは依然として2つの大きな制限に直面している。これらの制限は、様々な人間の価値をまたいだ良好なトレードオフを達成するのを困難にしている。これらの課題に対処するために、データの値整合性の観点から、多値アライメントを再考し、データフィルタリングとパラメータマージフレームワークであるVC-Soupを提案する。まず、各選好対の報酬ギャップベクトルと全対ベクトルとのコサイン類似性に基づいて、その交差値コヒーレンスを定量化する値整合度を設計する。次に、各値データセット内の低一貫性の選好ペアをフィルタリングし、残りのデータをトレーニングして、線形モード接続性をよりよく維持するスムーズな値一貫性ポリシーモデルを得る。最後に、これらのポリシーを線形に組み合わせ、Paretoフィルタを値全体に適用し、バランスの取れた多値性能のソリューションを得る。大規模な実験と理論解析により、VC-Soupは競合を効果的に軽減し、既存の多値アライメント法より一貫して優れていることが示された。

論文の概要: VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models

関連論文リスト