Related papers: NyayaAnumana & INLegalLlama: The Largest Indian Legal Judgment Prediction Dataset and Specialized Language Model for Enhanced Decision Analysis

NyayaAnumana & INLegalLlama: The Largest Indian Legal Judgment Prediction Dataset and Specialized Language Model for Enhanced Decision Analysis

URL: http://arxiv.org/abs/2412.08385v1
Date: Wed, 11 Dec 2024 13:50:17 GMT
Title: NyayaAnumana & INLegalLlama: The Largest Indian Legal Judgment Prediction Dataset and Specialized Language Model for Enhanced Decision Analysis
Authors: Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya,
Abstract summary: This paper introduces NyayaAnumana, the largest and most diverse corpus of Indian legal cases compiled for legal judgment prediction (LJP)<n>NyayaAnumana includes a wide range of cases from the Supreme Court, High Courts, Tribunal Courts, District Courts, and Daily Orders.<n>In addition to the dataset, we present INLegalLlama, a domain-specific generative large language model (LLM) tailored to the intricacies of the Indian legal system.
Score: 5.790242888372048
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The integration of artificial intelligence (AI) in legal judgment prediction (LJP) has the potential to transform the legal landscape, particularly in jurisdictions like India, where a significant backlog of cases burdens the legal system. This paper introduces NyayaAnumana, the largest and most diverse corpus of Indian legal cases compiled for LJP, encompassing a total of 7,02,945 preprocessed cases. NyayaAnumana, which combines the words "Nyay" (judgment) and "Anuman" (prediction or inference) respectively for most major Indian languages, includes a wide range of cases from the Supreme Court, High Courts, Tribunal Courts, District Courts, and Daily Orders and, thus, provides unparalleled diversity and coverage. Our dataset surpasses existing datasets like PredEx and ILDC, offering a comprehensive foundation for advanced AI research in the legal domain. In addition to the dataset, we present INLegalLlama, a domain-specific generative large language model (LLM) tailored to the intricacies of the Indian legal system. It is developed through a two-phase training approach over a base LLaMa model. First, Indian legal documents are injected using continual pretraining. Second, task-specific supervised finetuning is done. This method allows the model to achieve a deeper understanding of legal contexts. Our experiments demonstrate that incorporating diverse court data significantly boosts model accuracy, achieving approximately 90% F1-score in prediction tasks. INLegalLlama not only improves prediction accuracy but also offers comprehensible explanations, addressing the need for explainability in AI-assisted legal decisions.

Related papers

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context [5.790242888372048]
TathyaNyaya is the largest annotated dataset for Fact-based Judgment Prediction and Explanation (FJPE) tailored to the Indian legal context. We present FactLegalLlama, an instruction-tuned variant of the LLaMa-3-8B Large Language Model (LLM) optimized for generating high-quality explanations in FJPE tasks.
arXiv Detail & Related papers (2025-04-07T05:27:32Z)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases. Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models. Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z)
InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
PILOT: Legal Case Outcome Prediction with Case Law [43.680862577060765]
We identify two unique challenges in making legal case outcome predictions with case law. First, it is crucial to identify relevant precedent cases that serve as fundamental evidence for judges during decision-making. Second, it is necessary to consider the evolution of legal principles over time, as early cases may adhere to different legal contexts.
arXiv Detail & Related papers (2024-01-28T21:18:05Z)
SLJP: Semantic Extraction based Legal Judgment Prediction [0.0]
Legal Judgment Prediction (LJP) is a judicial assistance system that recommends the legal components such as applicable statues, prison term and penalty term. Most of the existing Indian models did not adequately concentrate on the semantics embedded in the fact description (FD) that impacts the decision. The proposed semantic extraction based LJP (SLJP) model provides the advantages of pretrained transformers for complex unstructured legal case document understanding.
arXiv Detail & Related papers (2023-12-13T08:50:02Z)
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
Exploiting Contrastive Learning and Numerical Evidence for Confusing Legal Judgment Prediction [46.71918729837462]
Given the fact description text of a legal case, legal judgment prediction aims to predict the case's charge, law article and penalty term. Previous studies fail to distinguish different classification errors with a standard cross-entropy classification loss. We propose a moco-based supervised contrastive learning to learn distinguishable representations. We further enhance the representation of the fact description with extracted crime amounts which are encoded by a pre-trained numeracy model.
arXiv Detail & Related papers (2022-11-15T15:53:56Z)
Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law [7.366081387295463]
We re-train two popular legal PLMs, LegalBERT and CaseLawBERT, on Indian legal data, as well as train a model from scratch with a vocabulary based on Indian legal text. We observe our approach not only enhances performance on the new domain (Indian texts) but also over the original domain (European and UK texts)
arXiv Detail & Related papers (2022-09-13T15:01:11Z)
Predicting Indian Supreme Court Judgments, Decisions, Or Appeals [0.403831199243454]
We introduce our newly developed ML-enabled legal prediction model and its operational prototype, eLegPredict. eLegPredict is trained and tested over 3072 supreme court cases and has achieved 76% accuracy (F1-score) The eLegPredict is equipped with a mechanism to aid end users, where as soon as a document with new case description is dropped into a designated directory, the system quickly reads through its content and generates prediction.
arXiv Detail & Related papers (2021-09-28T18:28:43Z)
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.