Related papers: Breaking the Manual Annotation Bottleneck: Creating a Comprehensive Legal Case Criticality Dataset through Semi-Automated Labeling

Breaking the Manual Annotation Bottleneck: Creating a Comprehensive Legal Case Criticality Dataset through Semi-Automated Labeling

URL: http://arxiv.org/abs/2410.13460v1
Date: Thu, 17 Oct 2024 11:43:16 GMT
Title: Breaking the Manual Annotation Bottleneck: Creating a Comprehensive Legal Case Criticality Dataset through Semi-Automated Labeling
Authors: Ronja Stern, Ken Kawamura, Matthias Stürmer, Ilias Chalkidis, Joel Niklaus,
Abstract summary: This paper introduces the Criticality Prediction dataset, a new resource for evaluating the potential influence of Swiss Supreme Court decisions on future jurisprudence. Unlike existing approaches that rely on resource-intensive manual annotations, we semi-automatically derive labels leading to a much larger dataset. We evaluate several multilingual models, including fine-tuned variants and large language models, and find that fine-tuned models consistently outperform zero-shot baselines.
Score: 16.529070321280447
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Predicting case criticality helps legal professionals in the court system manage large volumes of case law. This paper introduces the Criticality Prediction dataset, a new resource for evaluating the potential influence of Swiss Federal Supreme Court decisions on future jurisprudence. Unlike existing approaches that rely on resource-intensive manual annotations, we semi-automatically derive labels leading to a much larger dataset than otherwise possible. Our dataset features a two-tier labeling system: (1) the LD-Label, which identifies cases published as Leading Decisions (LD), and (2) the Citation-Label, which ranks cases by their citation frequency and recency. This allows for a more nuanced evaluation of case importance. We evaluate several multilingual models, including fine-tuned variants and large language models, and find that fine-tuned models consistently outperform zero-shot baselines, demonstrating the need for task-specific adaptation. Our contributions include the introduction of this task and the release of a multilingual dataset to the research community.

Related papers

Improving the Accuracy and Efficiency of Legal Document Tagging with Large Language Models and Instruction Prompts [0.6554326244334866]
Legal-LLM is a novel approach that leverages the instruction-following capabilities of Large Language Models (LLMs) through fine-tuning. We evaluate our method on two benchmark datasets, POSTURE50K and EURLEX57K, using micro-F1 and macro-F1 scores.
arXiv Detail & Related papers (2025-04-12T18:57:04Z)
LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification [6.549338652948716]
We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features.
arXiv Detail & Related papers (2025-02-09T10:07:05Z)
DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning [54.35107462768146]
Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. Existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. This paper proposes a novel dual-diversity enhancing and uncertainty-aware framework for CSAL.
arXiv Detail & Related papers (2025-02-01T04:00:03Z)
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions [25.82451110740322]
This paper introduces CaseSumm, a novel dataset for long-context summarization in the legal domain. We collect 25.6K U.S. Supreme Court (SCOTUS) opinions and their official summaries, known as "syllabuses" Our dataset is the largest open legal case summarization dataset, and is the first to include summaries of SCOTUS decisions dating back to 1815.
arXiv Detail & Related papers (2024-12-30T19:00:01Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
A Small Claims Court for the NLP: Judging Legal Text Classification Strategies With Small Datasets [0.0]
This paper investigates the best strategies for optimizing the use of a small labeled dataset and large amounts of unlabeled data. We use the records of demands to a Brazilian Public Prosecutor's Office aiming to assign the descriptions in one of the subjects. The best result was obtained with Unsupervised Data Augmentation (UDA), which jointly uses BERT, data augmentation, and strategies of semi-supervised learning.
arXiv Detail & Related papers (2024-09-09T18:10:05Z)
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation [44.67578050648625]
We transform a large open-source legal corpus into a dataset supporting information retrieval (IR) and retrieval-augmented generation (RAG) This dataset CLERC is constructed for training and evaluating models on their ability to (1) find corresponding citations for a given piece of legal analysis and to (2) compile the text of these citations into a cogent analysis that supports a reasoning goal.
arXiv Detail & Related papers (2024-06-24T23:57:57Z)
Evaluating the Fairness of Discriminative Foundation Models in Computer Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP) We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy. Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z)
Transfer-Free Data-Efficient Multilingual Slot Labeling [82.02076369811402]
Slot labeling is a core component of task-oriented dialogue (ToD) systems. To mitigate the inherent data scarcity issue, current research on multilingual ToD assumes that sufficient English-language annotated data are always available. We propose a two-stage slot labeling approach (termed TWOSL) which transforms standard multilingual sentence encoders into effective slot labelers.
arXiv Detail & Related papers (2023-05-22T22:47:32Z)
Retrieval-augmented Multi-label Text Classification [20.100081284294973]
Multi-label text classification is a challenging task in settings of large label sets. Retrieval augmentation aims to improve the sample efficiency of classification models. We evaluate this approach on four datasets from the legal and biomedical domains.
arXiv Detail & Related papers (2023-05-22T14:16:23Z)
Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z)
LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models [8.745407715423992]
Cross-lingual document representations enable language understanding in multilingual contexts. Large pre-trained language models such as BERT, XLM and XLM-RoBERTa have achieved great success when fine-tuned on sentence-level downstream tasks.
arXiv Detail & Related papers (2021-06-07T07:14:00Z)
Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks. We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)
Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline [94.0601799665342]
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task. We propose establishing summary-source alignment as an explicit task, while introducing two major novelties. We create a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. We present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.
arXiv Detail & Related papers (2020-09-01T17:27:12Z)
Exemplar Auditing for Multi-Label Biomedical Text Classification [0.4873362301533824]
We generalize a recently proposed zero-shot sequence labeling method, "supervised labeling via a convolutional decomposition" The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors. Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
arXiv Detail & Related papers (2020-04-07T02:54:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.