MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of
Indian Legal Case Judgments
- URL: http://arxiv.org/abs/2310.18600v1
- Date: Sat, 28 Oct 2023 05:51:57 GMT
- Title: MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of
Indian Legal Case Judgments
- Authors: Debtanu Datta, Shubham Soni, Rajdeep Mukherjee, Saptarshi Ghosh
- Abstract summary: It is crucial to summarize the legal documents in Indian languages to ensure equitable access to justice.
This study presents a pioneering effort toward cross-lingual summarization of English legal documents into Hindi.
We construct the first high-quality legal corpus comprising of 3,122 case judgments from prominent Indian courts in English, along with their summaries in both English and Hindi.
- Score: 6.522489660886997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic summarization of legal case judgments is a practically important
problem that has attracted substantial research efforts in many countries. In
the context of the Indian judiciary, there is an additional complexity --
Indian legal case judgments are mostly written in complex English, but a
significant portion of India's population lacks command of the English
language. Hence, it is crucial to summarize the legal documents in Indian
languages to ensure equitable access to justice. While prior research primarily
focuses on summarizing legal case judgments in their source languages, this
study presents a pioneering effort toward cross-lingual summarization of
English legal documents into Hindi, the most frequently spoken Indian language.
We construct the first high-quality legal corpus comprising of 3,122 case
judgments from prominent Indian courts in English, along with their summaries
in both English and Hindi, drafted by legal practitioners. We benchmark the
performance of several diverse summarization approaches on our corpus and
demonstrate the need for further research in cross-lingual summarization in the
legal domain.
Related papers
- To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction [44.5492443909544]
We use a corpus of problem descriptions written by laypeople in English that is annotated by practising lawyers.
Inherent subjectivity exists in our task because legal area categorisation is a complex task.
arXiv Detail & Related papers (2024-08-05T06:16:31Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - VLSP 2023 -- LTER: A Summary of the Challenge on Legal Textual
Entailment Recognition [7.030684932312313]
This paper introduces the first fundamental research for the Vietnamese language in the legal domain: legal textual entailment recognition.
We discuss certain linguistic aspects critical in the legal domain that pose challenges that need to be addressed.
arXiv Detail & Related papers (2024-03-06T03:42:06Z) - LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK
Case Law Dataset [0.0]
This study addresses the gap in the literature working with large legal corpora about how to isolate cases, in our case summary judgments, from a large corpus of UK court decisions.
We use the Cambridge Law Corpus of 356,011 UK court decisions and determine that the large language model achieves a weighted F1 score of 0.94 versus 0.78 for keywords.
We identify and extract 3,102 summary judgment cases, enabling us to map their distribution across various UK courts over a temporal span.
arXiv Detail & Related papers (2024-03-04T10:13:30Z) - Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning [49.23103067844278]
We propose the task of multi-defendant LJP, which aims to automatically predict the judgment results for each defendant of multi-defendant cases.
Two challenges arise with the task of multi-defendant LJP: (1) indistinguishable judgment results among various defendants; and (2) the lack of a real-world dataset for training and evaluation.
arXiv Detail & Related papers (2023-12-10T04:46:30Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Pre-trained Language Models for the Legal Domain: A Case Study on Indian
Law [7.366081387295463]
We re-train two popular legal PLMs, LegalBERT and CaseLawBERT, on Indian legal data, as well as train a model from scratch with a vocabulary based on Indian legal text.
We observe our approach not only enhances performance on the new domain (Indian texts) but also over the original domain (European and UK texts)
arXiv Detail & Related papers (2022-09-13T15:01:11Z) - A Multi-Task Benchmark for Korean Legal Language Understanding and
Judgement Prediction [19.89425856249463]
We present the first large-scale benchmark of Korean legal AI datasets, LBox Open.
The legal corpus consists of 150k Korean precedents (264M tokens), of which 63k are sentenced in last 4 years.
The two classification tasks are case names (10k) and statutes (3k) prediction from the factual description of individual cases.
The LJP tasks consist of (1) 11k criminal examples where the model is asked to predict fine amount, imprisonment with labor, and imprisonment without labor ranges for the given facts.
arXiv Detail & Related papers (2022-06-10T16:51:45Z) - Indian Legal NLP Benchmarks : A Survey [0.0]
There is a need to create separate Natural Language Processing benchmarks for Indian Legal Text.
This will spur innovation in applications of Natural language Processing for Indian Legal Text.
arXiv Detail & Related papers (2021-07-13T13:10:10Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.