Fugu-MT 論文翻訳(概要): ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

論文の概要: ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

arxiv url: http://arxiv.org/abs/2510.09062v1
Date: Fri, 10 Oct 2025 07:08:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 00:38:48.301252
Title: ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability
Title（参考訳）: ReFIne: 信頼性、信条、解釈性を備えた信頼できる大規模推論モデルのためのフレームワーク
Authors: Chung-En Sun, Ge Yan, Akshay Kulkarni, Tsui-Wei Weng,
Abstract要約: 使用可能な推論システムは、解釈可能性、忠実性、信頼性の3つの特性を特徴とする、信頼できるものでなければならない、と我々は主張する。本稿では,GRPOと教師付き微調整を統合した新しいトレーニングフレームワークReFIneを提案する。実験の結果,ReFIneモデルはより明確でより構造化された推論トレースを生成することがわかった。
参考スコア（独自算出の注目度）: 23.70973331911138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in long chain-of-thought (CoT) reasoning have largely prioritized answer accuracy and token efficiency, while overlooking aspects critical to trustworthiness. We argue that usable reasoning systems must be trustworthy, characterized by three properties: interpretability, faithfulness, and reliability. To this end, we propose ReFIne, a new training framework that integrates supervised fine-tuning with GRPO to encourage models to: (i) improve interpretability by producing structured, tag-based traces with high-level planning that are easier for humans to follow; (ii) enhance faithfulness by explicitly disclosing the decisive information guiding each solution, with consistent cross-section references; and (iii) promote reliability by providing self-assessments of both the derivation's soundness and the confidence of the final answer. We apply ReFIne to the Qwen3 models at multiple scales (1.7B/4B/8B) and evaluate across mathematical benchmarks of varying difficulty. Our experimental results show that ReFIne models generate clearer and better-structured reasoning traces (interpretability +44.0%), more faithfully expose their underlying decision process (faithfulness +18.8%), and offer informative confidence estimates (reliability +42.4%). These findings highlight an overlooked but important direction: reasoning models should be optimized not only for accuracy, but also for broader dimensions of trustworthiness. Our code is available at: https://github.com/Trustworthy-ML-Lab/Training_Trustworthy_LRM_with_Refine
Abstract（参考訳）: ロングチェーン・オブ・ソート(CoT)推論の最近の進歩は、信頼性に重要な側面を見越しながら、解答精度とトークン効率を大きく優先順位付けしている。使用可能な推論システムは、解釈可能性、忠実性、信頼性の3つの特性を特徴とする、信頼できるものでなければならない、と我々は主張する。この目的のために,GRPOと教師付き微調整を統合した新しいトレーニングフレームワークであるReFIneを提案する。一人間が従うのが容易な高レベルの計画で、構造化されたタグベースのトレースを作成することにより、解釈可能性を向上させること。二各解決を導く決定的情報を一貫した断面基準で明示的に開示し、忠実性を高めること。三導出の健全性及び最終回答の信頼度を自己評価することにより、信頼性を高めること。複数スケール(1.7B/4B/8B)のQwen3モデルにReFIneを適用し、様々な難易度のある数学ベンチマークで評価する。実験の結果、ReFIneモデルはより明確でより構造化された推論トレース(解釈可能性+44.0%)を生成し、基礎となる決定プロセス(信頼度+18.8%)をより忠実に公開し、情報的信頼度推定(信頼性+42.4%)を提供することが示された。推論モデルは、正確性だけでなく、より幅広い信頼性の次元のために最適化されるべきである。 https://github.com/Trustworthy-ML-Lab/Training_Trustworthy_LRM_with_Refine

論文の概要: ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

関連論文リスト