Fugu-MT 論文翻訳(概要): Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions

論文の概要: Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions

arxiv url: http://arxiv.org/abs/2603.22973v1
Date: Tue, 24 Mar 2026 09:10:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.397091
Title: Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions
Title（参考訳）: 専門家が否定するところ、モデルは失敗:フランスの裁判所判決で不適切な法的扇動を検出
Authors: Avrile Floro, Tamara Dhorasoo, Soline Pellez, Nils Holzenberger,
Abstract要約: 我々は,第一審裁判所の決定において,フランス民法典の暗黙の引用に焦点をあてる。専門家の不一致がモデル失敗を予測することを示す。これらの制限にもかかわらず、タスクをトップkランキングとして再検討し、マルチモデルコンセンサスを活用することで、教師なしの環境でk = 200で76%の精度が得られる。
参考スコア（独自算出の注目度）: 3.8449738927037207
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computational methods applied to legal scholarship hold the promise of analyzing law at scale. We start from a simple question: how often do courts implicitly apply statutory rules? This requires distinguishing legal reasoning from semantic similarity. We focus on implicit citation of the French Civil Code in first-instance court decisions and introduce a benchmark of 1,015 passage-article pairs annotated by three legal experts. We show that expert disagreement predicts model failures. Inter-annotator agreement is moderate ($κ$ = 0.33) with 43% of disagreements involving the boundary between factual description and legal reasoning. Our supervised ensemble achieves F1 = 0.70 (77% accuracy), but this figure conceals an asymmetry: 68% of false positives fall on the 33% of cases where the annotators disagreed. Despite these limits, reframing the task as top-k ranking and leveraging multi-model consensus yields 76% precision at k = 200 in an unsupervised setting. Moreover, the remaining false positives tend to surface legally ambiguous applications rather than obvious errors.
Abstract（参考訳）: 法学に適用される計算手法は、大規模に法律を分析するという約束を果たす。裁判所が法律規則を暗黙的に適用する頻度はどれくらいかという単純な質問から始まります。これは意味的類似性から法的推論を区別する必要がある。第一審裁判所の決定において、フランス民法典の暗黙の引用に焦点を合わせ、3人の法律専門家が注釈を付けた1015件の条文対のベンチマークを導入する。専門家の不一致がモデル失敗を予測することを示す。アノテーション間の合意は適度(κ$ = 0.33)であり、事実記述と法的推論の境界に関する43%の意見の相違がある。我々の監督されたアンサンブルは F1 = 0.70 (77% の精度) を達成するが、この図は非対称性を隠蔽している。これらの制限にもかかわらず、タスクをトップkランキングとして再検討し、マルチモデルコンセンサスを活用することで、教師なしの環境でk = 200で76%の精度が得られる。さらに、残りの偽陽性は、明らかな誤りではなく、法的にあいまいな応用を呈する傾向がある。

論文の概要: Where Experts Disagree, Models Fail: Detecting Implicit Legal Citations in French Court Decisions

関連論文リスト