Fugu-MT 論文翻訳(概要): Prompt-Based Value Steering of Large Language Models

論文の概要: Prompt-Based Value Steering of Large Language Models

arxiv url: http://arxiv.org/abs/2511.16688v1
Date: Fri, 14 Nov 2025 14:45:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-24 18:08:18.742611
Title: Prompt-Based Value Steering of Large Language Models
Title（参考訳）: プロンプトに基づく大規模言語モデルの値ステアリング
Authors: Giulio Antonio Abbo, Tony Belpaeme,
Abstract要約: 提案手法は,素早い候補が生成したテキストを特定の人的価値に向けて効果的に操れるかどうかを評価するための,実用的で再現可能な,モデルに依存しない手順である。我々は,人間の基本値の理論と対話データセットによる構造化評価を用いて,ウィザード・ヴィクナ言語モデルの変種に適用する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are increasingly used in applications where alignment with human values is critical. While model fine-tuning is often employed to ensure safe responses, this technique is static and does not lend itself to everyday situations involving dynamic values and preferences. In this paper, we present a practical, reproducible, and model-agnostic procedure to evaluate whether a prompt candidate can effectively steer generated text toward specific human values, formalising a scoring method to quantify the presence and gain of target values in generated responses. We apply our method to a variant of the Wizard-Vicuna language model, using Schwartz's theory of basic human values and a structured evaluation through a dialogue dataset. With this setup, we compare a baseline prompt to one explicitly conditioned on values, and show that value steering is possible even without altering the model or dynamically optimising prompts.
Abstract（参考訳）: 人間の価値との整合が重要となるアプリケーションでは、大規模な言語モデルがますます使われています。モデルファインチューニングは安全な応答を保証するためにしばしば使用されるが、この手法は静的であり、動的値や嗜好を含む日常的な状況に影響を与えない。本稿では,生成したテキストを特定の人的価値に向けて効果的に操れるかどうかを評価するための,実用的で再現性が高く,モデルに依存しない手法を提案する。我々は,人間の基本値の理論と対話データセットによる構造化評価を用いて,ウィザード・ヴィクナ言語モデルの変種に適用する。この設定では、ベースラインプロンプトを明示的に値に条件付けされたプロンプトと比較し、モデルを変更したり、動的にプロンプトを最適化したりすることなく、値ステアリングが可能であることを示す。

論文の概要: Prompt-Based Value Steering of Large Language Models

関連論文リスト