Fugu-MT 論文翻訳(概要): Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

論文の概要: Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

arxiv url: http://arxiv.org/abs/2604.23730v1
Date: Sun, 26 Apr 2026 14:15:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.525559
Title: Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task
Title（参考訳）: LLMのオープンエンディング法理推論の専門的評価
Authors: Jungmin Choi, Keisuke Sakaguchi, Hiroaki Yamada,
Abstract要約: 大規模言語モデル(LLM)は、バー試験の複数選択要素を含む、法的なベンチマークで強いパフォーマンスを示している。本研究は, LLMの開放的法的推論性能を評価するために設計された最初のデータセットである。
参考スコア（独自算出の注目度）: 10.316445397110291
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have shown strong performance on legal benchmarks, including multiple-choice components of bar exams. However, their capacity for generating open-ended legal reasoning in realistic scenarios remains insufficiently explored. Notably, to our best knowledge, there are no prior studies or datasets addressing this issue in the Japanese context. This study presents the first dataset designed to evaluate the open-ended legal reasoning performance of LLMs within the Japanese jurisdiction. The dataset is based on the writing component of the Japanese bar examination, which requires examinees to identify multiple legal issues from long narratives and to construct structured legal arguments in free text format. Our key contribution is the manual evaluation of LLMs' generated responses by legal experts, which reveals limitations and challenges in legal reasoning. Moreover, we conducted a manual analysis of hallucinations to characterize when and how the models introduce content not supported by precedent or law. Our real exam questions, model-generated responses, and expert evaluations reveal the milestones of current LLMs in the Japanese legal domain. Our dataset and relevant resources will be available online.
Abstract（参考訳）: 大規模言語モデル(LLM)は、バー試験の複数選択要素を含む、法的なベンチマークで強いパフォーマンスを示している。しかし、現実的なシナリオにおいて、オープンエンドな法的推論を生成する能力は、まだ十分に検討されていない。特に、我々の知る限りでは、日本の文脈でこの問題に対処する先行研究やデータセットは存在しない。本研究は, LLMの開放的法的推論性能を評価するために設計された最初のデータセットである。本データセットは,長文から複数の法的問題を識別し,自由テキスト形式で構造化された法的議論を構築するために,日本語のバー試験の書き方に基づいている。我々の主要な貢献は、法律専門家によるLSMの生成した応答を手動で評価することであり、法的推論における限界と課題を明らかにします。さらに,前例や法律で支持されないコンテンツをいつ,どのように導入するかを特徴付けるために,幻覚を手動で分析した。実際の試験質問,モデル生成応答,専門家評価は,日本の法律領域における現在のLLMのマイルストーンを明らかにします。データセットと関連するリソースはオンラインで提供されます。

論文の概要: Expert Evaluation of LLM's Open-Ended Legal Reasoning on the Japanese Bar Exam Writing Task

関連論文リスト