Fugu-MT 論文翻訳(概要): Learning Transferable Latent User Preferences for Human-Aligned Decision Making

論文の概要: Learning Transferable Latent User Preferences for Human-Aligned Decision Making

arxiv url: http://arxiv.org/abs/2605.12682v1
Date: Tue, 12 May 2026 19:32:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.645811
Title: Learning Transferable Latent User Preferences for Human-Aligned Decision Making
Title（参考訳）: ヒューマンアラインな意思決定のための伝達可能な潜在ユーザ嗜好の学習
Authors: Alina Hyk, Sandhya Saisubramanian,
Abstract要約: ヒューマンアラインな意思決定には、明示された目標と潜伏したユーザの好みの両方を考慮しなければならない。本稿では,CLIPR(Conversational Learning for Inferring Preferences and Reasoning)を紹介する。 3つのデータセットの評価とユーザスタディによると、CLIPRはアライメントの改善と推論コストの削減において、既存の手法を一貫して上回っている。
参考スコア（独自算出の注目度）: 4.1789291746171715
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models (LLMs) are increasingly used as reasoning modules in many applications. While they are efficient in certain tasks, LLMs often struggle to produce human-aligned solutions. Human-aligned decision making requires accounting for both explicitly stated goals and latent user preferences that shape how ambiguous situations should be resolved. Existing approaches to incorporating such preferences either rely on extensive and repeated user interactions or fail to generalize latent preferences across tasks and contexts, limiting their practical applicability. We consider a setting in which an LLM is used for high-level reasoning and is responsible for inferring latent user preferences from limited interactions, which guides downstream decision making. We introduce CLIPR (Conversational Learning for Inferring Preferences and Reasoning), a framework that learns actionable, transferable natural language rules that represent latent user preferences from minimal conversational input. These rules are iteratively refined through adaptive feedback and applied to both in-distribution and out-of-distribution ambiguous tasks across multiple environments. Evaluations on three datasets and a user study show that CLIPR consistently outperforms existing methods in improving alignment and reducing inference costs.
Abstract（参考訳）: 大規模言語モデル (LLM) は、多くのアプリケーションにおいて推論モジュールとして使われるようになっている。特定のタスクでは効率が良いが、LLMはヒューマンアラインなソリューションを作るのに苦労することが多い。ヒューマンアラインな意思決定には、明確に述べられた目標と、曖昧な状況をいかに解決すべきかを形作る潜伏したユーザの好みの両方を考慮する必要があります。このような嗜好を取り入れるための既存のアプローチは、広範囲で繰り返されるユーザインタラクションに依存するか、タスクやコンテキストにまたがる遅延選好の一般化に失敗し、実践的な適用性を制限している。我々は,LLMを高レベルな推論に利用し,限られたインタラクションから潜在ユーザの嗜好を推測する役割を担っていることを考察し,下流での意思決定を導く。 CLIPR(Conversational Learning for Inferring Preferences and Reasoning)は,最小限の会話入力から潜在ユーザの好みを表す行動可能な自然言語規則を学習するフレームワークである。これらのルールは、適応的なフィードバックを通じて反復的に洗練され、複数の環境における分配内および分配外あいまいなタスクに適用される。 3つのデータセットの評価とユーザスタディによると、CLIPRはアライメントの改善と推論コストの削減において、既存の手法を一貫して上回っている。

論文の概要: Learning Transferable Latent User Preferences for Human-Aligned Decision Making

関連論文リスト