Fugu-MT 論文翻訳(概要): Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

論文の概要: Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

arxiv url: http://arxiv.org/abs/2510.17900v1
Date: Sun, 19 Oct 2025 10:04:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:12.331484
Title: Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
Title（参考訳）: LLMは裁判所対応か? インドの法律推論におけるフロンティアモデルの評価
Authors: Kush Juvekar, Arghya Bhattacharya, Sai Khadloya, Utkarsh Saxena,
Abstract要約: 私たちは、インドの公的な司法試験を透明な代理として利用しています。私たちのベンチマークは、国家試験と国家試験の客観的な画面をまとめたものです。我々はまた、最高裁判所のAdvocate-on-Record試験による長文の回答について、弁護士に格付けされた、ペアの書面による研究も含んでいる。
参考スコア（独自算出の注目度）: 0.5308136763388956
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are entering legal workflows, yet we lack a jurisdiction-specific framework to assess their baseline competence therein. We use India's public legal examinations as a transparent proxy. Our multi-year benchmark assembles objective screens from top national and state exams and evaluates open and frontier LLMs under real-world exam conditions. To probe beyond multiple-choice questions, we also include a lawyer-graded, paired-blinded study of long-form answers from the Supreme Court's Advocate-on-Record exam. This is, to our knowledge, the first exam-grounded, India-specific yardstick for LLM court-readiness released with datasets and protocols. Our work shows that while frontier systems consistently clear historical cutoffs and often match or exceed recent top-scorer bands on objective exams, none surpasses the human topper on long-form reasoning. Grader notes converge on three reliability failure modes: procedural or format compliance, authority or citation discipline, and forum-appropriate voice and structure. These findings delineate where LLMs can assist (checks, cross-statute consistency, statute and precedent lookups) and where human leadership remains essential: forum-specific drafting and filing, procedural and relief strategy, reconciling authorities and exceptions, and ethical, accountable judgment.
Abstract（参考訳）: 大規模言語モデル(LLM)は法的ワークフローに入りつつありますが、その基盤となる能力を評価するための管轄的な枠組みは欠如しています。私たちは、インドの公的な司法試験を透明な代理として利用しています。当社のマルチ年次ベンチマークでは,全国および州のトップ試験の客観的画面を集計し,実世界試験条件下でのオープン・フロンティアLCMの評価を行った。複数票の質問を超えて調査するためには、最高裁判所の上級試験からの長文の回答を、弁護士が評価し、ペアで研究することも含まれる。これは、私たちの知る限り、LLMの法廷準備のためにデータセットとプロトコルがリリースされた最初の試験場、インド固有のヤードスティックです。我々の研究は、フロンティアシステムが一貫して歴史的カットオフを明確にし、しばしば客観的な試験で最近のトップスカラーバンドと一致または超えるが、ロングフォームな推論では人間に勝ることはないことを示している。グレーダーノートは、手続き的または形式的コンプライアンス、権威または引用の規律、フォーラムに適した音声と構造という3つの信頼性障害モードに収束する。これらの発見は、LLMが支援できる場所(チェック、横断的整合性、法令および前例の見直し)と、人間のリーダーシップが不可欠である場所(フォーラム固有の起草と提出、手続き的および救済戦略、当局と例外の調整、倫理的、説明可能な判断)を明確にしている。

論文の概要: Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

関連論文リスト