Fugu-MT 論文翻訳(概要): Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback

論文の概要: Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback

arxiv url: http://arxiv.org/abs/2510.16257v1
Date: Fri, 17 Oct 2025 23:06:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:11.737886
Title: Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback
Title（参考訳）: スパースフィードバックを用いた低リソースアライメントの多視点化に向けて
Authors: Chu Fei Luo, Samuel Dahan, Xiaodan Zhu,
Abstract要約: 我々は,多元的デコーディングとモデルステアリングという2つの手法を用いて,低リソース環境下での言語モデルの多元的アライメントを強化することを目指している。提案手法は,ヘイトスピーチ検出や誤情報検出などのハイテイクタスクにおいて,偽陽性を減少させる。私たちの研究が多様性の重要性と、言語モデルが微妙な視点に適応できる方法を強調していることを願っています。
参考スコア（独自算出の注目度）: 13.065059683491958
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As language models have a greater impact on society, it is important to ensure they are aligned to a diverse range of perspectives and are able to reflect nuance in human values. However, the most popular training paradigms for modern language models often assume there is one optimal answer for every query, leading to generic responses and poor alignment. In this work, we aim to enhance pluralistic alignment of language models in a low-resource setting with two methods: pluralistic decoding and model steering. We empirically demonstrate that model steering offers consistent improvement over zero-shot and few-shot baselines with only 50 annotated samples. Our proposed methods decrease false positives in several high-stakes tasks such as hate speech detection and misinformation detection, and improves the distributional alignment to human values in GlobalOpinionQA. We hope our work highlights the importance of diversity and how language models can be adapted to consider nuanced perspectives.
Abstract（参考訳）: 言語モデルは社会により大きな影響を与えるため、それらが様々な視点に整列し、人間の価値観のニュアンスを反映できることを保証することが重要である。しかし、現代言語モデルの最も一般的なトレーニングパラダイムは、全てのクエリに最適な答えが1つあると仮定し、一般的な応答とアライメントが不十分である。本研究では,多元的デコーディングとモデルステアリングという2つの手法を用いて,低リソース環境下での言語モデルの多元的アライメントを強化することを目的とする。モデルステアリングが50の注釈付きサンプルでゼロショットと少数ショットのベースラインに対して一貫した改善をもたらすことを実証的に実証した。提案手法は,ヘイトスピーチ検出や誤情報検出などのハイテイクタスクにおける偽陽性を低減し,GlobalOpinionQAにおける人間の値に対する分布アライメントを改善する。私たちの研究が多様性の重要性と、言語モデルが微妙な視点に適応できる方法を強調していることを願っています。

論文の概要: Towards Low-Resource Alignment to Diverse Perspectives with Sparse Feedback

関連論文リスト