Fugu-MT 論文翻訳(概要): HIPO: Instruction Hierarchy via Constrained Reinforcement Learning

論文の概要: HIPO: Instruction Hierarchy via Constrained Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.16152v1
Date: Tue, 17 Mar 2026 06:12:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.123361
Title: HIPO: Instruction Hierarchy via Constrained Reinforcement Learning
Title（参考訳）: HIPO:制約付き強化学習による指導階層
Authors: Keru Chen, Jun Luo, Sen Lin, Yingbin Liang, Alvaro Velasquez, Nathaniel Bastian, Shaofeng Zou,
Abstract要約: textscHIPOは、制約付きマルコフ決定プロセスとしてHIFを定式化する新しいアライメントフレームワークである。 textscHIPOはシステムプロンプトを単に入力コンテキストから厳密なアルゴリズム境界まで高める。
参考スコア（独自算出の注目度）: 57.40686733111483
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hierarchical Instruction Following (HIF) refers to the problem of prompting large language models with a priority-ordered stack of instructions. Standard methods like RLHF and DPO typically fail in this problem since they mainly optimize for a single objective, failing to explicitly enforce system prompt compliance. Meanwhile, supervised fine-tuning relies on mimicking filtered, compliant data, which fails to establish the priority asymmetry at the algorithmic level. In this paper, we introduce \textsc{HIPO}, a novel alignment framework that formulates HIF as a Constrained Markov Decision Process. \textsc{HIPO} elevates system prompts from mere input context to strict algorithmic boundaries. Using a primal-dual safe reinforcement learning approach, the algorithm dynamically enforces system prompt compliance as an explicit constraint, maximizing user utility strictly within this feasible region. Extensive evaluations across diverse model architectures (e.g., Qwen, Phi, Llama) demonstrate that \textsc{HIPO} significantly improves both system compliance and user utility. Furthermore, mechanistic analysis reveals that this constrained optimization autonomously drives the model to shift its attention toward long-range system tokens, providing a principled foundation for reliable LLM deployment in complex workflows.
Abstract（参考訳）: 階層的命令追従 (hierarchical Instruction following, HIF) とは、命令の優先順位付けされたスタックで大規模言語モデルを誘導する問題を指す。 RLHFやDPOのような標準メソッドは、主に単一の目的のために最適化されるため、システムプロンプトコンプライアンスを明示的に強制することができないため、この問題で失敗するのが一般的である。一方、教師付き微調整は、アルゴリズムレベルで優先的な非対称性を確立するのに失敗する、フィルタリングされた、準拠したデータの模倣に依存している。本稿では,HIF を制約付きマルコフ決定過程として定式化する新しいアライメントフレームワークである \textsc{HIPO} を紹介する。システムプロンプトを単に入力コンテキストから厳密なアルゴリズム境界まで高める。このアルゴリズムは, システムプロンプトコンプライアンスを明示的な制約として動的に実施し, この実現可能な領域内でのユーザの有用性を厳格に最大化する。多様なモデルアーキテクチャ(例えば、Qwen、Phi、Llama)にわたる広範な評価は、 \textsc{HIPO}がシステムコンプライアンスとユーザユーティリティの両方を大幅に改善することを示している。さらに、機械的解析により、この制約付き最適化がモデルを自律的に長距離システムトークンにシフトさせ、複雑なワークフローにおける信頼性LLMデプロイメントの原則的基盤を提供することを明らかにした。

論文の概要: HIPO: Instruction Hierarchy via Constrained Reinforcement Learning

関連論文リスト