Fugu-MT 論文翻訳(概要): Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

論文の概要: Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

arxiv url: http://arxiv.org/abs/2606.20482v1
Date: Thu, 18 Jun 2026 17:00:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:40.005427
Title: Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users
Title（参考訳）: マウスと目は、ユーザーの好みをひそかに漏れている
Authors: Haw-Shiuan Chang, Jeffrey Gomez, Mehul Patwari, Aryan Sajith, Hamed Zamani,
Abstract要約: 既存の手法の多くは、明確な人間のフィードバックを収集し、応答テキストに基づいて人間の好みを予測するために報酬モデルを訓練する。 IFLLMと呼ばれる新しいデータセットを構築し、59人のメカニカル・トルコ人労働者から1336のマルチターン質問を収集します。暗黙のフィードバックに基づく報酬モデルは、テキストベースの報酬モデルの精度を55%から64%に向上させ、相対的な応答品質の改善をほぼ3倍にします。
参考スコア（独自算出の注目度）: 25.631351524871636
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To align a Large Language Model (LLM), most existing methods collect explicit human feedback and train a reward model to predict the human preference based on the response text. These existing methods have two key limitations. First, the users rarely provide explicit feedback for LLM responses, which makes the high-quality preference annotation expensive to collect. Second, the methods do not leverage implicit human feedback, which has proven vital to the economic moats of Internet giants. To quantify the value of implicit feedback, we build a new dataset called IFLLM, which collects 1336 multi-turn questions from the 59 Mechanical Turk workers, their mouse trajectories, and eye gazing points to the LLMs' responses from their webcams. IFLLM shows that the users have very diverse types of gazing behavior and mouse trajectories. Our reward model based on the implicit user feedback boosts the accuracy of the text-based reward model from 55% to 64% and nearly triples the relative response quality improvements after applying the DPO to eight LLMs, demonstrating the value of implicit feedback in the wild. Our data collection website, dataset, and codes can be found at https://github.com/themehulpatwari/llm-implicit-feedback/.
Abstract（参考訳）: LLM(Large Language Model)を整列させるために、既存のほとんどの手法は明示的な人間のフィードバックを収集し、応答テキストに基づいて人間の好みを予測するために報酬モデルを訓練する。これらの既存手法には2つの重要な制限がある。まず、LLM応答に対して明示的なフィードバックを与えることはめったにないため、高品質な選好アノテーションの収集が高価になる。第二に、この手法は暗黙の人間のフィードバックを生かしていない。暗黙のフィードバックの価値を定量化するために、59人の機械トルコ人労働者から1336人のマルチターン質問、マウスの軌跡、ウェブカメラからLLMの反応の視線点を収集するIFLLMというデータセットを構築した。 IFLLMは、利用者の視線行動とマウスの軌跡に非常に多様な種類があることを示している。暗黙のフィードバックに基づく報酬モデルにより、テキストベースの報酬モデルの精度は55%から64%に向上し、DPOを8つのLLMに適用した後、相対的な応答品質の改善をほぼ3倍にし、暗黙のフィードバックの価値を実証する。データ収集のWebサイト、データセット、コードはhttps://github.com/themehulpatwari/llm-implicit-feedback/にある。

論文の概要: Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

関連論文リスト