Fugu-MT 論文翻訳(概要): Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models

論文の概要: Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models

arxiv url: http://arxiv.org/abs/2505.20087v1
Date: Mon, 26 May 2025 15:01:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-27 16:58:43.537037
Title: Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models
Title（参考訳）: 共振による安全:共振ガードレールモデルに関する実証的研究
Authors: Makesh Narsimhan Sreedhar, Traian Rebedea, Christopher Parisien,
Abstract要約: 推論に基づく言語モデルは、様々な領域で強いパフォーマンスを示している。近年の研究では、推論は安全性とガードレールの応用にも大きなメリットをもたらすことが示されている。本研究はデータ効率と推論効率の2つの重要な側面に焦点を当てている。
参考スコア（独自算出の注目度）: 3.102576158218633
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Reasoning-based language models have demonstrated strong performance across various domains, with the most notable gains seen in mathematical and coding tasks. Recent research has shown that reasoning also offers significant benefits for LLM safety and guardrail applications. In this work, we conduct a comprehensive analysis of training reasoning-based guardrail models for content moderation, with an emphasis on generalization to custom safety policies at inference time. Our study focuses on two key dimensions: data efficiency and inference efficiency. On the data front, we find that reasoning-based models exhibit strong sample efficiency, achieving competitive performance with significantly fewer training examples than their non-reasoning counterparts. This unlocks the potential to repurpose the remaining data for mining high-value, difficult samples that further enhance model performance. On the inference side, we evaluate practical trade-offs by introducing reasoning budgets, examining the impact of reasoning length on latency and accuracy, and exploring dual-mode training to allow runtime control over reasoning behavior. Our findings will provide practical insights for researchers and developers to effectively and efficiently train and deploy reasoning-based guardrails models in real-world systems.
Abstract（参考訳）: 推論に基づく言語モデルは、様々な領域で強い性能を示しており、数学やコーディングのタスクにおいて最も顕著な利点がある。近年の研究では、推理はLLMの安全性とガードレールの応用にも大きな恩恵をもたらすことが示されている。本研究では,コンテンツモデレーションのための学習推論に基づくガードレールモデルを包括的に分析し,推論時のカスタム安全ポリシーへの一般化を重視した。本研究はデータ効率と推論効率の2つの重要な側面に焦点を当てている。データ面では、推論に基づくモデルは強力なサンプル効率を示し、非推論モデルよりもはるかに少ないトレーニング例で競争性能を達成する。これにより、残りのデータを再利用して、モデルパフォーマンスをさらに向上する、高価値で難しいサンプルをマイニングすることが可能になる。推論側では、推論予算を導入し、推論長が待ち時間と精度に与える影響を調べ、推論動作のランタイム制御を可能にするための2モードトレーニングを探索することで、実践的なトレードオフを評価する。我々の発見は、研究者や開発者が現実世界のシステムで推論に基づくガードレールモデルを効果的に、効率的に訓練し、展開するための実践的な洞察を提供する。

関連論文リスト

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
大きな推論モデル(LRM)は、効率を阻害し、推論コストを膨らませる過剰な考えを示す。 LRM効率を向上させるための2つの軽量手法を提案する。まず,学習不要なアクティベーションステアリング技術であるEfficic Steeringを導入する。第2に,タスクの正確さと簡潔さを動的にバランスする強化学習フレームワークである自己回帰効率RLを開発する。
論文参考訳（メタデータ） (2025-06-18T17:18:12Z)
Dissecting Long Reasoning Models: An Empirical Study [94.31064312707211]
強化学習(RL)における正・負のサンプルの役割を系統的に分析する。グループ相対的政策最適化において、サンプルの半数以上がゼロの優位性を持つような実質的なデータ非効率性を同定する。本研究では,様々な推論モデルとベンチマークの不安定な性能について検討し,不明瞭な結果を伴う不確実な問題に対する不安定性について考察した。
論文参考訳（メタデータ） (2025-06-05T11:47:10Z)
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning [82.43575191712726]
本稿では,強化学習が推論に与える影響を明らかにするための,きめ細かい分析フレームワークを提案する。本フレームワークは,RLトレーニングの恩恵を受けると仮定された重要な要素を具体的に調査する。
論文参考訳（メタデータ） (2025-06-05T07:53:59Z)
Rationales Are Not Silver Bullets: Measuring the Impact of Rationales on Model Performance and Reliability [70.4107059502882]
有理数拡張による学習言語モデルは、多くの既存の作品において有益であることが示されている。モデル性能に対する合理的性の影響を徹底的に調査するため、包括的調査を行う。
論文参考訳（メタデータ） (2025-05-30T02:39:37Z)
Behavior Injection: Preparing Language Models for Reinforcement Learning [24.46625106928253]
強化微調整(Reinforcement fine-tuning, RFT)は、大規模言語モデル(LLM)の推論能力を高めるための強力なポストトレーニング手法として登場した。 LLM は RFT に非常に矛盾しない応答が可能である。 RLに先立って適用されたタスクに依存しないデータ拡張方式である振舞い注入を提案する。
論文参考訳（メタデータ） (2025-05-25T00:54:50Z)
Efficient Inference for Large Reasoning Models: A Survey [42.61170621552432]
LRM(Large Reasoning Models)は、Large Language Models(LLM)の推論能力を大幅に向上させる。しかし、それらの熟考的推論プロセスはトークンの使用、メモリ消費、推論時間に非効率をもたらす。本調査では, LRMに特化して設計された効率的な推論手法を概説し, 推論品質を維持しつつトークンの非効率を緩和することに着目した。
論文参考訳（メタデータ） (2025-03-29T13:27:46Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
大規模言語モデル(LLM)は複雑なタスクにおいて顕著な機能を示した。 OpenAI o1とDeepSeek-R1の最近の進歩は、System-2推論ドメインのパフォーマンスをさらに改善した。
論文参考訳（メタデータ） (2025-03-20T17:59:38Z)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
収集された軌道上でのDPO(Direct Preference Optimization)を通して計画に基づく推論を学習するフレームワークを提案する。論理的推論ベンチマークの挑戦的な結果から,学習フレームワークの有効性が示された。
論文参考訳（メタデータ） (2024-02-01T15:18:33Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。