Related papers: Semantic Segmentation of Legal Documents via Rhetorical Roles

Semantic Segmentation of Legal Documents via Rhetorical Roles

URL: http://arxiv.org/abs/2112.01836v1
Date: Fri, 3 Dec 2021 10:49:19 GMT
Title: Semantic Segmentation of Legal Documents via Rhetorical Roles
Authors: Vijit Malik and Rishabh Sanjay and Shouvik Kumar Guha and Shubham Kumar Nigam and Angshuman Hazarika and Arnab Bhattacharya and Ashutosh Modi
Abstract summary: This paper proposes a Rhetorical Roles (RR) system for segmenting a legal document into semantically coherent units. We develop a multitask learning-based deep learning model with document rhetorical role label shift as an auxiliary task for segmenting a legal document.
Score: 3.285073688021526
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Legal documents are unstructured, use legal jargon, and have considerable length, making it difficult to process automatically via conventional text processing techniques. A legal document processing system would benefit substantially if the documents could be semantically segmented into coherent units of information. This paper proposes a Rhetorical Roles (RR) system for segmenting a legal document into semantically coherent units: facts, arguments, statute, issue, precedent, ruling, and ratio. With the help of legal experts, we propose a set of 13 fine-grained rhetorical role labels and create a new corpus of legal documents annotated with the proposed RR. We develop a system for segmenting a document into rhetorical role units. In particular, we develop a multitask learning-based deep learning model with document rhetorical role label shift as an auxiliary task for segmenting a legal document. We experiment extensively with various deep learning models for predicting rhetorical roles in a document, and the proposed model shows superior performance over the existing models. Further, we apply RR for predicting the judgment of legal cases and show that the use of RR enhances the prediction compared to the transformer-based models.

Related papers

MARRO: Multi-headed Attention for Rhetorical Role Labeling in Legal Documents [8.596233578884162]
Identification of rhetorical roles like facts, arguments, and final judgments is central to understanding a legal case document. Legal documents are often unstructured and contain a specialized vocabulary, making it hard for conventional transformer models to understand them. We propose a novel family of multi-task learning-based models for rhetorical role labeling, named MARRO, that uses transformer-inspired multi-headed attention.
arXiv Detail & Related papers (2025-03-08T08:05:20Z)
LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification [6.549338652948716]
We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features.
arXiv Detail & Related papers (2025-02-09T10:07:05Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
HiCuLR: Hierarchical Curriculum Learning for Rhetorical Role Labeling of Legal Documents [1.2562034805037443]
HiCuLR is a hierarchical curriculum learning framework for Rhetorical Role Labeling. It nests two curricula: Rhetorical Role-level Curriculum (RC) on the outer layer and Document-level Curriculum (DC) on the inner layer.
arXiv Detail & Related papers (2024-09-27T11:28:01Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
Enhancing Pre-Trained Language Models with Sentence Position Embeddings for Rhetorical Roles Recognition in Legal Opinions [0.16385815610837165]
The size of legal opinions continues to grow, making it increasingly challenging to develop a model that can accurately predict the rhetorical roles of legal opinions. We propose a novel model architecture for automatically predicting rhetorical roles using pre-trained language models (PLMs) enhanced with knowledge of sentence position information. Based on an annotated corpus from the LegalEval@SemEval2023 competition, we demonstrate that our approach requires fewer parameters, resulting in lower computational costs.
arXiv Detail & Related papers (2023-10-08T20:33:55Z)
Rhetorical Role Labeling of Legal Documents using Transformers and Graph Neural Networks [1.290382979353427]
This paper presents the approaches undertaken to perform the task of rhetorical role labelling on Indian Court Judgements as part of SemEval Task 6: understanding legal texts, shared subtask A.
arXiv Detail & Related papers (2023-05-06T17:04:51Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions [7.99536002595393]
We propose an unsupervised graph-based ranking model that uses a reweighting algorithm to exploit document structure. Results on the Canadian Legal Case Law dataset show that our proposed method outperforms several strong baselines.
arXiv Detail & Related papers (2022-11-06T22:20:42Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Corpus for Automatic Structuring of Legal Documents [1.8025738207124173]
We introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. We show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction.
arXiv Detail & Related papers (2022-01-31T11:12:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.