Related papers: To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction

To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction

URL: http://arxiv.org/abs/2408.02257v1
Date: Mon, 5 Aug 2024 06:16:31 GMT
Title: To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction
Authors: Kemal Kurniawan, Meladel Mistica, Timothy Baldwin, Jey Han Lau,
Abstract summary: We use a corpus of problem descriptions written by laypeople in English that is annotated by practising lawyers. Inherent subjectivity exists in our task because legal area categorisation is a complex task.
Score: 44.5492443909544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores the task of automatic prediction of text spans in a legal problem description that support a legal area label. We use a corpus of problem descriptions written by laypeople in English that is annotated by practising lawyers. Inherent subjectivity exists in our task because legal area categorisation is a complex task, and lawyers often have different views on a problem, especially in the face of legally-imprecise descriptions of issues. Experiments show that training on majority-voted spans outperforms training on disaggregated ones.

Related papers

AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases. Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models. Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z)
Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment [0.0]
This paper develops and applies a novel taxonomy for topic modelling summary judgment cases in the United Kingdom. Using a curated dataset of summary judgment cases, we use the Large Language Model Claude 3 Opus to explore functional topics and trends. We find that Claude 3 Opus correctly classified the topic with an accuracy of 87.10%.
arXiv Detail & Related papers (2024-05-21T16:30:25Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models [10.834755282333589]
Long-form Legal Question Answering dataset comprises 1,868 expert-annotated legal questions in the French language. Our experimental results demonstrate promising performance on automatic evaluation metrics. As one of the only comprehensive, expert-annotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains.
arXiv Detail & Related papers (2023-09-29T08:23:19Z)
Prototype-Based Interpretability for Legal Citation Prediction [16.660004925391842]
We design the task with parallels to the thought-process of lawyers, i.e., with reference to both precedents and legislative provisions. After initial experimental results, we refine the target citation predictions with the feedback of legal experts. We introduce a prototype architecture to add interpretability, achieving strong performance while adhering to decision parameters used by lawyers.
arXiv Detail & Related papers (2023-05-25T21:40:58Z)
CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal Assistive Writing [44.75251805925605]
We introduce a labeled dataset of 178M sentences for citation-worthiness detection in the legal domain from the Caselaw Access Project (CAP) The performance of various deep learning models was examined on this novel dataset. The domain-specific pre-trained model tends to outperform other models, with an 88% F1-score for the citation-worthiness detection task.
arXiv Detail & Related papers (2023-05-03T04:20:56Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Exploiting Contrastive Learning and Numerical Evidence for Confusing Legal Judgment Prediction [46.71918729837462]
Given the fact description text of a legal case, legal judgment prediction aims to predict the case's charge, law article and penalty term. Previous studies fail to distinguish different classification errors with a standard cross-entropy classification loss. We propose a moco-based supervised contrastive learning to learn distinguishable representations. We further enhance the representation of the fact description with extracted crime amounts which are encoded by a pre-trained numeracy model.
arXiv Detail & Related papers (2022-11-15T15:53:56Z)
Legal Element-oriented Modeling with Multi-view Contrastive Learning for Legal Case Retrieval [3.909749182759558]
We propose an interaction-focused network for legal case retrieval with a multi-view contrastive learning objective. Case-view contrastive learning minimizes the hidden space distance between relevant legal case representations. We employ a legal element knowledge-aware indicator to detect legal elements of cases.
arXiv Detail & Related papers (2022-10-11T06:47:23Z)
Important Sentence Identification in Legal Cases Using Multi-Class Classification [0.1499944454332829]
This research explores the usage of sentence embeddings for multi-class classification to identify important sentences in a legal case. A task-specific loss function is defined in order to improve the accuracy restricted by the straightforward use of categorical cross entropy loss.
arXiv Detail & Related papers (2021-11-10T14:58:29Z)
Aspect Classification for Legal Depositions [0.0]
It is important to know not only about liability, but also about events, accidents, physical conditions, and treatments. A legal deposition consists of various aspects that are discussed as part of the deponent testimony. Our methods have achieved a classification F1 score of 0.83.
arXiv Detail & Related papers (2020-09-09T18:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.