Fugu-MT 論文翻訳(概要): Peeking Inside LLMs: Leveraging Internal Artifacts of LLMs for Enhancing Reliability in Legal Classification

論文の概要: Peeking Inside LLMs: Leveraging Internal Artifacts of LLMs for Enhancing Reliability in Legal Classification

arxiv url: http://arxiv.org/abs/2606.20929v1
Date: Thu, 18 Jun 2026 20:44:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 11:49:37.440295
Title: Peeking Inside LLMs: Leveraging Internal Artifacts of LLMs for Enhancing Reliability in Legal Classification
Title（参考訳）: LLMの内部を覗く: LLMの内部アーチファクトを活用して法的分類の信頼性を高める
Authors: Sudipta Santra, Debtanu Datta, Saptarshi Ghosh,
Abstract要約: 大きな言語モデル(LLM)は、法的領域でますます採用されている。強い性能にもかかわらず、LLMは誤ったあるいは幻覚的な出力を生成する傾向にある。法領域分類タスクにおいて, LLMの内部アーティファクトを利用して予測の正しさを検出する可能性について検討する。
参考スコア（独自算出の注目度）: 2.6691901601750785
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are increasingly being adopted in the legal domain. However, despite their strong performance, LLMs are prone to generating incorrect or hallucinated outputs, raising serious concerns about their reliability in high-stakes domains such as law. Detecting the correctness of responses of LLM-based systems is therefore a critical challenge. In this work, we explore the potential of leveraging internal artifacts of LLM to detect the correctness of their predictions in legal-domain classification tasks. We develop approaches that utilize features derived from these internal artifacts to build downstream classifiers capable of identifying incorrect LLM outputs. We evaluate our approach on two representative legal classification tasks: bail decision prediction and statute violation prediction. Our experimental results demonstrate that LLMs' internal artifacts are reliable indicators for detecting incorrect predictions in legal classification tasks, and can be applied to enhance the reliability of LLM-based classification systems.
Abstract（参考訳）: 大きな言語モデル(LLM)は、法的領域でますます採用されている。しかし、LLMは高い性能にもかかわらず、不正または幻覚的なアウトプットを発生させる傾向にあり、法律などの高い領域における信頼性への深刻な懸念が高まっている。したがって、LCMベースのシステムの応答の正しさを検出することは重要な課題である。本研究では,LLMの内部アーティファクトを活用して,法領域分類タスクにおける予測の正しさを検出する可能性について検討する。我々はこれらの内部アーティファクトから派生した特徴を生かして、誤ったLCM出力を識別できる下流分類器を構築する手法を開発した。我々は、保釈決定予測と法令違反予測という2つの代表的な法的分類課題に対するアプローチを評価する。実験により, LLMの内部成果物は法的な分類課題における誤予測を検出するための信頼性指標であり, LLMに基づく分類システムの信頼性向上に有効であることが確認された。

論文の概要: Peeking Inside LLMs: Leveraging Internal Artifacts of LLMs for Enhancing Reliability in Legal Classification

関連論文リスト