Fugu-MT 論文翻訳(概要): KoBLEX: Open Legal Question Answering with Multi-hop Reasoning

論文の概要: KoBLEX: Open Legal Question Answering with Multi-hop Reasoning

arxiv url: http://arxiv.org/abs/2509.01324v1
Date: Mon, 01 Sep 2025 10:07:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 15:17:03.637741
Title: KoBLEX: Open Legal Question Answering with Multi-hop Reasoning
Title（参考訳）: KoBLEX:マルチホップ推論によるオープンな法的質問
Authors: Jihyung Lee, Daehui Kim, Seonjeong Hwang, Hyounghun Kim, Gary Lee,
Abstract要約: 韓国法定説明可能QA(KoBLEX)ベンチマークについて紹介する。 KoBLEXは、プロビジョニングされたマルチホップの法的推論を評価するように設計されている。 ParSeR (Parametric provision-guided Selection Retrieval) という手法も提案する。
参考スコア（独自算出の注目度）: 12.122913185860634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLM) have achieved remarkable performances in general domains and are now extending into the expert domain of law. Several benchmarks have been proposed to evaluate LLMs' legal capabilities. However, these benchmarks fail to evaluate open-ended and provision-grounded Question Answering (QA). To address this, we introduce a Korean Benchmark for Legal EXplainable QA (KoBLEX), designed to evaluate provision-grounded, multi-hop legal reasoning. KoBLEX includes 226 scenario-based QA instances and their supporting provisions, created using a hybrid LLM-human expert pipeline. We also propose a method called Parametric provision-guided Selection Retrieval (ParSeR), which uses LLM-generated parametric provisions to guide legally grounded and reliable answers. ParSeR facilitates multi-hop reasoning on complex legal questions by generating parametric provisions and employing a three-stage sequential retrieval process. Furthermore, to better evaluate the legal fidelity of the generated answers, we propose Legal Fidelity Evaluation (LF-Eval). LF-Eval is an automatic metric that jointly considers the question, answer, and supporting provisions and shows a high correlation with human judgments. Experimental results show that ParSeR consistently outperforms strong baselines, achieving the best results across multiple LLMs. Notably, compared to standard retrieval with GPT-4o, ParSeR achieves +37.91 higher F1 and +30.81 higher LF-Eval. Further analyses reveal that ParSeR efficiently delivers consistent performance across reasoning depths, with ablations confirming the effectiveness of ParSeR.
Abstract（参考訳）: 大規模言語モデル(LLM)は、一般的なドメインで顕著なパフォーマンスを達成し、現在は専門家の法律領域に拡張されています。 LLMの法的な能力を評価するために、いくつかのベンチマークが提案されている。しかし、これらのベンチマークは、オープンエンドおよびプロビジョニンググラウンドの質問回答(QA)の評価に失敗している。そこで本研究では,韓国法定法定法定QA(KoBLEX)について,規定に基づくマルチホップ法定QA(KoBLEX)を提案する。 KoBLEXには、226のシナリオベースのQAインスタンスと、ハイブリッドLLM-ヒューマンエキスパートパイプラインを使用して作成されるサポート条項が含まれている。また,LLM生成パラメトリック法を用いて,法的根拠と信頼性のある回答を導出するParSeR(Parmetric provision-guided Selection Retrieval)を提案する。 ParSeRは、パラメトリックな規定を生成し、3段階のシーケンシャル検索プロセスを採用することで、複雑な法的問題に対するマルチホップ推論を容易にする。さらに,得られた回答の法的忠実度をよりよく評価するために,法的な忠実度評価(LF-Eval)を提案する。 LF-Evalは、質問、回答、サポート条項を共同で考慮し、人間の判断と高い相関を示す自動計量である。実験の結果、ParSeRは強いベースラインを一貫して上回り、複数のLLMで最高の結果が得られることがわかった。特に、GPT-4oによる標準的な検索と比較して、ParSeRは+37.91高いF1と+30.81高いLF-Evalを達成している。さらに解析した結果,ParSeRは推論深度にわたって効率よく一貫した性能を実現し,AblationsはParSeRの有効性を確認した。

論文の概要: KoBLEX: Open Legal Question Answering with Multi-hop Reasoning

関連論文リスト