Related papers: JUDGEBERT: Assessing Legal Meaning Preservation Between Sentences

JUDGEBERT: Assessing Legal Meaning Preservation Between Sentences

URL: http://arxiv.org/abs/2508.16870v1
Date: Sat, 23 Aug 2025 02:03:16 GMT
Title: JUDGEBERT: Assessing Legal Meaning Preservation Between Sentences
Authors: David Beauchemin, Michelle Albert-Rochette, Richard Khoury, Pierre-Luc Déziel,
Abstract summary: This paper introduces FrJUDGE, a new dataset to assess legal meaning preservation between two legal texts.<n>It also introduces JUDGEBERT, a novel evaluation metric designed to assess legal meaning preservation in French legal text simplification.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Simplifying text while preserving its meaning is a complex yet essential task, especially in sensitive domain applications like legal texts. When applied to a specialized field, like the legal domain, preservation differs significantly from its role in regular texts. This paper introduces FrJUDGE, a new dataset to assess legal meaning preservation between two legal texts. It also introduces JUDGEBERT, a novel evaluation metric designed to assess legal meaning preservation in French legal text simplification. JUDGEBERT demonstrates a superior correlation with human judgment compared to existing metrics. It also passes two crucial sanity checks, while other metrics did not: For two identical sentences, it always returns a score of 100%; on the other hand, it returns 0% for two unrelated sentences. Our findings highlight its potential to transform legal NLP applications, ensuring accuracy and accessibility for text simplification for legal practitioners and lay users.

Related papers

Evaluating Legal Reasoning Traces with Legal Issue Tree Rubrics [49.3262123849242]
We introduce LEGIT (LEGal Issue Trees), a novel large-scale (24K instances) expert-level legal reasoning dataset.<n>We convert court judgments into hierarchical trees of opposing parties' arguments and the court's conclusions, which serve as rubrics for evaluating the issue coverage and correctness of the reasoning traces.
arXiv Detail & Related papers (2025-11-30T18:32:43Z)
LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification [6.549338652948716]
We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles.<n>Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features.
arXiv Detail & Related papers (2025-02-09T10:07:05Z)
The Use of Readability Metrics in Legal Text: A Systematic Literature Review [3.439579933384111]
Linguistic complexity is an important contributor to difficulties experienced by readers. Document readability metrics have been developed to measure document readability. Not all legal domains are well represented in terms of readability metrics.
arXiv Detail & Related papers (2024-11-14T15:04:17Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [121.44909266398194]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.<n>We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.<n>SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts [0.0]
We evaluate the capability of a state-of-the-art generative pre-trained transformer (GPT) model to perform semantic annotation of short text snippets. We found that the GPT model performs surprisingly well in zero-shot settings on diverse types of documents.
arXiv Detail & Related papers (2023-05-08T01:55:53Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z)
Exploiting Contrastive Learning and Numerical Evidence for Confusing Legal Judgment Prediction [46.71918729837462]
Given the fact description text of a legal case, legal judgment prediction aims to predict the case's charge, law article and penalty term. Previous studies fail to distinguish different classification errors with a standard cross-entropy classification loss. We propose a moco-based supervised contrastive learning to learn distinguishable representations. We further enhance the representation of the fact description with extracted crime amounts which are encoded by a pre-trained numeracy model.
arXiv Detail & Related papers (2022-11-15T15:53:56Z)
Unsupervised Simplification of Legal Texts [0.0]
We introduce an unsupervised simplification method for legal texts (USLT) USLT performs domain-specific TS by replacing complex words and splitting long sentences. We demonstrate that USLT outperforms state-of-the-art domain-general TS methods in text simplicity while keeping the semantics intact.
arXiv Detail & Related papers (2022-09-01T15:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.