Related papers: Corpus for Automatic Structuring of Legal Documents

Corpus for Automatic Structuring of Legal Documents

URL: http://arxiv.org/abs/2201.13125v1
Date: Mon, 31 Jan 2022 11:12:44 GMT
Title: Corpus for Automatic Structuring of Legal Documents
Authors: Prathamesh Kalamkar and Aman Tiwari and Astha Agarwal and Saurabh Karn and Smita Gupta and Vivek Raghavan and Ashutosh Modi
Abstract summary: We introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. We show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction.
Score: 1.8025738207124173
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In populous countries, pending legal cases have been growing exponentially. There is a need for developing techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated with a label coming from a list of pre-defined Rhetorical Roles. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. Further, we show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction. We release the corpus and baseline model code along with the paper.

Related papers

Incorporating Legal Structure in Retrieval-Augmented Generation: A Case Study on Copyright Fair Use [44.99833362998488]
This paper presents a domain-specific implementation of Retrieval-Augmented Generation tailored to the Fair Use Doctrine in U.S. copyright law.<n>Motivated by the increasing prevalence of DMCA takedowns and the lack of accessible legal support for content creators, we propose a structured approach that combines semantic search with legal knowledge graphs and court citation networks to improve retrieval quality and reasoning reliability.
arXiv Detail & Related papers (2025-05-04T15:53:49Z)
LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification [6.549338652948716]
We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features.
arXiv Detail & Related papers (2025-02-09T10:07:05Z)
Knowledge Graphs Construction from Criminal Court Appeals: Insights from the French Cassation Court [49.1574468325115]
This paper presents a framework for constructing knowledge graphs from appeals to the French Cassation Court. The framework includes a domain-specific ontology and a derived dataset, offering a foundation for structured legal data representation and analysis.
arXiv Detail & Related papers (2025-01-24T15:38:32Z)
CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation [44.67578050648625]
We transform a large open-source legal corpus into a dataset supporting information retrieval (IR) and retrieval-augmented generation (RAG) This dataset CLERC is constructed for training and evaluating models on their ability to (1) find corresponding citations for a given piece of legal analysis and to (2) compile the text of these citations into a cogent analysis that supports a reasoning goal.
arXiv Detail & Related papers (2024-06-24T23:57:57Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
Enhancing Pre-Trained Language Models with Sentence Position Embeddings for Rhetorical Roles Recognition in Legal Opinions [0.16385815610837165]
The size of legal opinions continues to grow, making it increasingly challenging to develop a model that can accurately predict the rhetorical roles of legal opinions. We propose a novel model architecture for automatically predicting rhetorical roles using pre-trained language models (PLMs) enhanced with knowledge of sentence position information. Based on an annotated corpus from the LegalEval@SemEval2023 competition, we demonstrate that our approach requires fewer parameters, resulting in lower computational costs.
arXiv Detail & Related papers (2023-10-08T20:33:55Z)
Rhetorical Role Labeling of Legal Documents using Transformers and Graph Neural Networks [1.290382979353427]
This paper presents the approaches undertaken to perform the task of rhetorical role labelling on Indian Court Judgements as part of SemEval Task 6: understanding legal texts, shared subtask A.
arXiv Detail & Related papers (2023-05-06T17:04:51Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Fine-grained Intent Classification in the Legal Domain [2.088409822555567]
We introduce a dataset of 93 legal documents, belonging to the case categories of either Murder, Land Dispute, Robbery, or Corruption. We annotate fine-grained intents for each such phrase to enable a deeper understanding of the case for a reader. We analyze the performance of several transformer-based models in automating the process of extracting intent phrases.
arXiv Detail & Related papers (2022-05-06T23:57:17Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Semantic Segmentation of Legal Documents via Rhetorical Roles [3.285073688021526]
This paper proposes a Rhetorical Roles (RR) system for segmenting a legal document into semantically coherent units. We develop a multitask learning-based deep learning model with document rhetorical role label shift as an auxiliary task for segmenting a legal document.
arXiv Detail & Related papers (2021-12-03T10:49:19Z)
\textit{StateCensusLaws.org}: A Web Application for Consuming and Annotating Legal Discourse Learning [89.77347919191774]
We create a web application to highlight the output of NLP models trained to parse and label discourse segments in law text. We focus on state-level law that uses U.S. Census population numbers to allocate resources and organize government.
arXiv Detail & Related papers (2021-04-20T22:00:54Z)
Rule-Based Approach for Party-Based Sentiment Analysis in Legal Opinion Texts [0.3364569898365254]
Party-based sentiment analysis will play a key role in the automation system by identifying opinion values with respect to each legal parties in legal texts. Lawyers and legal officials have to spend considerable effort and time to obtain the required information manually from legal opinion texts.
arXiv Detail & Related papers (2020-11-11T10:07:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.