Fugu-MT 論文翻訳(概要): KnowRL: Teaching Language Models to Know What They Know

論文の概要: KnowRL: Teaching Language Models to Know What They Know

arxiv url: http://arxiv.org/abs/2510.11407v1
Date: Mon, 13 Oct 2025 13:47:14 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:30.383311
Title: KnowRL: Teaching Language Models to Know What They Know
Title（参考訳）: KnowRL: 言語モデルに、彼らが知っていることを知るように教える
Authors: Sahil Kale, Devendra Singh Dhami,
Abstract要約: 本稿では,モデルの内部で実現可能性境界の理解を深める,シンプルだが強力なフレームワークであるKnowRLを提案する。我々のフレームワークは、(i)イントロスペクション(i)モデルが判断するタスクを生成・分類する)と(ii)コンセンサスに基づく報酬(ii)の2つのコンポーネントを組み合わせています。シードセットが小さかったり,外部監視がなかったりしても,精度は28%,F1では12%向上した。
参考スコア（独自算出の注目度）: 9.341830361844337
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Truly reliable AI requires more than simply scaling up knowledge; it demands the ability to know what it knows and when it does not. Yet recent research shows that even the best LLMs misjudge their own competence in more than one in five cases, making any response born of such internal uncertainty impossible to fully trust. Inspired by self-improvement reinforcement learning techniques that require minimal data, we present a simple but powerful framework KnowRL that strengthens a model's internal understanding of its own feasibility boundaries, enabling safer and more responsible behaviour. Our framework combines two components: (i) introspection, where the model generates and classifies tasks it judges feasible or infeasible, and (ii) consensus-based rewarding, where stability of self-knowledge assessment is reinforced through internal agreement. By using internally generated data, this design strengthens consistency in self-knowledge and entirely avoids costly external supervision. In experiments on LLaMA-3.1-8B and Qwen-2.5-7B, KnowRL steadily improved self-knowledge, validated by both intrinsic self-consistency and extrinsic benchmarking. With nothing more than a small seed set and no external supervision, our method drove gains as high as 28% in accuracy and 12% in F1, outperforming baselines in just a few iterations. Our framework essentially unlocks the untapped capacity of LLMs to self-improve their knowledge awareness, opening the door to reliable, more accountable AI and safer deployment in critical applications. Owing to its simplicity and independence from external effort, we encourage applying this reliability-enhancing process to all future models.
Abstract（参考訳）: 真に信頼できるAIは、単に知識をスケールアップする以上のものを必要とします。しかし最近の研究では、最高のLLMでさえ5件に1つ以上のケースで自分の能力を誤っていることが示されており、そのような内部の不確実性から生まれた反応は、完全に信頼できない。最小限のデータを必要とする自己改善強化学習技術にインスパイアされた我々は、モデルの内部で実現可能なバウンダリに対する理解を強化し、より安全で責任ある振る舞いを可能にする、シンプルで強力なフレームワークであるKnowRLを提案する。私たちのフレームワークは2つのコンポーネントを組み合わせています。一モデルが実行可能又は実行不可能と判断するタスクを生成して分類する内観二自己知識評価の安定性を内部合意により強化する合意に基づく報奨内部で生成されたデータを使用することで、この設計は自己認識の一貫性を強化し、外部監視のコストを完全に回避する。 LLaMA-3.1-8BとQwen-2.5-7Bの実験では、KnowRLは自己知識を着実に改善し、本質的な自己整合性と外生的なベンチマークによって検証された。最小限のシードセットに過ぎず,外部監視も行わないため,F1では28%,F1では12%の精度で上昇し,数回の反復でベースラインを上回りました。我々のフレームワークは基本的に、LLMの未完成の能力を解放し、彼らの知識を自己改善し、信頼性が高く説明可能なAIへの扉を開き、重要なアプリケーションへのより安全なデプロイを可能にします。その単純さと外部からの独立性のため、私たちはこの信頼性向上プロセスを将来のすべてのモデルに適用することを奨励します。

論文の概要: KnowRL: Teaching Language Models to Know What They Know

関連論文リスト