LawSum: A weakly supervised approach for Indian Legal Document
Summarization
- URL: http://arxiv.org/abs/2110.01188v2
- Date: Tue, 5 Oct 2021 16:28:30 GMT
- Title: LawSum: A weakly supervised approach for Indian Legal Document
Summarization
- Authors: Vedant Parikh, Vidit Mathur, Parth Metha, Namita Mittal, Prasenjit
Majumder
- Abstract summary: We propose a new dataset consisting of over 10,000 judgements delivered by the supreme court of India.
The proposed dataset is pre-processed by normalising common legal abbreviations.
We also annotate each judgement with several attributes like date, names of the plaintiffs, defendants and the people representing them.
- Score: 1.7284359928761968
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unlike the courts in western countries, public records of Indian judiciary
are completely unstructured and noisy. No large scale publicly available
annotated datasets of Indian legal documents exist till date. This limits the
scope for legal analytics research. In this work, we propose a new dataset
consisting of over 10,000 judgements delivered by the supreme court of India
and their corresponding hand written summaries. The proposed dataset is
pre-processed by normalising common legal abbreviations, handling spelling
variations in named entities, handling bad punctuations and accurate sentence
tokenization. Each sentence is tagged with their rhetorical roles. We also
annotate each judgement with several attributes like date, names of the
plaintiffs, defendants and the people representing them, judges who delivered
the judgement, acts/statutes that are cited and the most common citations used
to refer the judgement. Further, we propose an automatic labelling technique
for identifying sentences which have summary worthy information. We demonstrate
that this auto labeled data can be used effectively to train a weakly
supervised sentence extractor with high accuracy. Some possible applications of
this dataset besides legal document summarization can be in retrieval, citation
analysis and prediction of decisions by a particular judge.
Related papers
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Breaking the Manual Annotation Bottleneck: Creating a Comprehensive Legal Case Criticality Dataset through Semi-Automated Labeling [16.529070321280447]
This paper introduces the Criticality Prediction dataset, a new resource for evaluating the potential influence of Swiss Supreme Court decisions on future jurisprudence.
Unlike existing approaches that rely on resource-intensive manual annotations, we semi-automatically derive labels leading to a much larger dataset.
We evaluate several multilingual models, including fine-tuned variants and large language models, and find that fine-tuned models consistently outperform zero-shot baselines.
arXiv Detail & Related papers (2024-10-17T11:43:16Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - Low-Resource Court Judgment Summarization for Common Law Systems [32.13166048504629]
We present CLSum, the first dataset for summarizing multi-jurisdictional common law court judgment documents.
This is the first court judgment summarization work adopting large language models (LLMs) in data augmentation, summary generation, and evaluation.
arXiv Detail & Related papers (2024-03-07T12:47:42Z) - SLJP: Semantic Extraction based Legal Judgment Prediction [0.0]
Legal Judgment Prediction (LJP) is a judicial assistance system that recommends the legal components such as applicable statues, prison term and penalty term.
Most of the existing Indian models did not adequately concentrate on the semantics embedded in the fact description (FD) that impacts the decision.
The proposed semantic extraction based LJP (SLJP) model provides the advantages of pretrained transformers for complex unstructured legal case document understanding.
arXiv Detail & Related papers (2023-12-13T08:50:02Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal
Assistive Writing [44.75251805925605]
We introduce a labeled dataset of 178M sentences for citation-worthiness detection in the legal domain from the Caselaw Access Project (CAP)
The performance of various deep learning models was examined on this novel dataset.
The domain-specific pre-trained model tends to outperform other models, with an 88% F1-score for the citation-worthiness detection task.
arXiv Detail & Related papers (2023-05-03T04:20:56Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Exploiting Contrastive Learning and Numerical Evidence for Confusing
Legal Judgment Prediction [46.71918729837462]
Given the fact description text of a legal case, legal judgment prediction aims to predict the case's charge, law article and penalty term.
Previous studies fail to distinguish different classification errors with a standard cross-entropy classification loss.
We propose a moco-based supervised contrastive learning to learn distinguishable representations.
We further enhance the representation of the fact description with extracted crime amounts which are encoded by a pre-trained numeracy model.
arXiv Detail & Related papers (2022-11-15T15:53:56Z) - Fine-grained Intent Classification in the Legal Domain [2.088409822555567]
We introduce a dataset of 93 legal documents, belonging to the case categories of either Murder, Land Dispute, Robbery, or Corruption.
We annotate fine-grained intents for each such phrase to enable a deeper understanding of the case for a reader.
We analyze the performance of several transformer-based models in automating the process of extracting intent phrases.
arXiv Detail & Related papers (2022-05-06T23:57:17Z) - JUSTICE: A Benchmark Dataset for Supreme Court's Judgment Prediction [0.0]
We aim to create a high-quality dataset of SCOTUS court cases so that they may be readily used in natural language processing (NLP) research and other data-driven applications.
By using advanced NLP algorithms to analyze previous court cases, the trained models are able to predict and classify a court's judgment.
arXiv Detail & Related papers (2021-12-06T23:19:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.