LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification
- URL: http://arxiv.org/abs/2502.05836v1
- Date: Sun, 09 Feb 2025 10:07:05 GMT
- Title: LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification
- Authors: Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya,
- Abstract summary: We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles.
Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features.
- Score: 6.549338652948716
- License:
- Abstract: In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory RhetoricLLaMA, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of neighboring sentences to assess their impact on classification accuracy. Despite these advancements, challenges persist in distinguishing between closely related roles and addressing class imbalance. Our work underscores the potential of advanced techniques for improving legal document understanding and sets a strong foundation for future research in legal NLP.
Related papers
- DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - Harnessing GPT-3.5-turbo for Rhetorical Role Prediction in Legal Cases [0.16385815610837165]
We propose a comprehensive study of one-stage elicitation techniques for querying a large pre-trained generative transformer (GPT-3.5-turbo) in the rhetorical role prediction task of legal cases.
We show that the number of examples, the definition of labels, the presentation of the (labelled) textual context and specific questions about this context have a positive influence on the performance of the model.
arXiv Detail & Related papers (2023-10-26T14:19:48Z) - Enhancing Pre-Trained Language Models with Sentence Position Embeddings
for Rhetorical Roles Recognition in Legal Opinions [0.16385815610837165]
The size of legal opinions continues to grow, making it increasingly challenging to develop a model that can accurately predict the rhetorical roles of legal opinions.
We propose a novel model architecture for automatically predicting rhetorical roles using pre-trained language models (PLMs) enhanced with knowledge of sentence position information.
Based on an annotated corpus from the LegalEval@SemEval2023 competition, we demonstrate that our approach requires fewer parameters, resulting in lower computational costs.
arXiv Detail & Related papers (2023-10-08T20:33:55Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Rhetorical Role Labeling of Legal Documents using Transformers and Graph
Neural Networks [1.290382979353427]
This paper presents the approaches undertaken to perform the task of rhetorical role labelling on Indian Court Judgements as part of SemEval Task 6: understanding legal texts, shared subtask A.
arXiv Detail & Related papers (2023-05-06T17:04:51Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and
Entailment Recognition [63.51569687229681]
We argue for the need to recognize the textual entailment relation of each proposition in a sentence individually.
We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters.
Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document.
arXiv Detail & Related papers (2022-12-21T04:03:33Z) - Semantic Segmentation of Legal Documents via Rhetorical Roles [3.285073688021526]
This paper proposes a Rhetorical Roles (RR) system for segmenting a legal document into semantically coherent units.
We develop a multitask learning-based deep learning model with document rhetorical role label shift as an auxiliary task for segmenting a legal document.
arXiv Detail & Related papers (2021-12-03T10:49:19Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - Legal Document Classification: An Application to Law Area Prediction of
Petitions to Public Prosecution Service [6.696983725360808]
This paper proposes the use of NLP techniques for textual classification.
Our main goal is to automate the process of assigning petitions to their respective areas of law.
The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network architecture.
arXiv Detail & Related papers (2020-10-13T18:05:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.