Fugu-MT 論文翻訳(概要): Response Time Enhances Alignment with Heterogeneous Preferences

論文の概要: Response Time Enhances Alignment with Heterogeneous Preferences

arxiv url: http://arxiv.org/abs/2605.06987v1
Date: Thu, 07 May 2026 22:05:23 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.64597
Title: Response Time Enhances Alignment with Heterogeneous Preferences
Title（参考訳）: 不均一な選好を伴うアライメントの応答時間
Authors: Federico Echenique, Alireza Fallah, Baihe Huang, Michael I. Jordan,
Abstract要約: 簡易な二次信号で選好データセットを増大させることで、住民の平均選好の識別性を回復できることを示す。私たちの結果は、将来的なデータ収集パイプラインに約束と新たな機会をもたらします。
参考スコア（独自算出の注目度）: 49.69696266152175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard choice-only labels. We prove that our estimator asymptotically converges to the true average preference even in extreme cases where each anonymous labeler contributes only a single choice. Empirically, across both synthetic and real-world datasets, our method consistently outperforms standard baselines that otherwise fail and plateau at a bias floor. Because response times are essentially free to record and require zero user tracking or identification, our results bring promises and open up new opportunities for future data-collection pipelines to improve the social benefit without requiring user-level identifiers or repeated elicitations.
Abstract（参考訳）: 大きな言語モデル(LLM)を人間の好みに合わせることは、通常、プールされたフィードバックを単一の報酬モデルに集約することに依存する。しかし、この標準的なアプローチは、全てのラベラーが同じ基本的好みを共有していると仮定し、現実世界のラベラーが非常に異質であり、通常匿名であるという事実を無視している。したがって、二分選択データのみに依存することは、学習方針を根本的に歪め、真の人口平均的嗜好を識別不能にする。この限界を克服するために、簡単な二次信号(ユーザの応答時間)で選好データセットを増大させることで、人口の平均選好の識別性を回復できることを実証する。それぞれの決定をDDM(Drift-Diffusion Model)としてモデル化することにより、標準選択のみのラベルの歪みを補正する不均一な選好の新たな一貫した推定器を導入する。匿名ラベルが1つの選択にのみ貢献する極端な場合においても、我々の推定値が漸近的に真の平均的嗜好に収束することを証明する。経験的に、人工的なデータセットと実世界のデータセットの両方で、私たちの手法は、バイアスフロアで失敗する標準ベースラインを一貫して上回ります。応答時間は基本的には記録が自由であり、ユーザ追跡や識別が不要であるため、ユーザレベルの識別子や繰り返しのエスカレーションを必要とせずに、将来のデータ収集パイプラインが社会的利益を改善するための新たな機会を約束し、開放します。

論文の概要: Response Time Enhances Alignment with Heterogeneous Preferences

関連論文リスト