Related papers: IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

URL: http://arxiv.org/abs/2407.05399v1
Date: Sun, 7 Jul 2024 14:55:04 GMT
Title: IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning
Authors: Abhinav Joshi, Shounak Paul, Akshat Sharma, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi,
Abstract summary: Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents. This paper proposes IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning.
Score: 16.12863746776168
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning. IL-TUR contains monolingual (English, Hindi) and multi-lingual (9 Indian languages) domain-specific tasks that address different aspects of the legal system from the point of view of understanding and reasoning over Indian legal documents. We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. To foster further research in the legal domain, we create a leaderboard (available at: https://exploration-lab.github.io/IL-TUR/) where the research community can upload and compare legal text understanding systems.

Related papers

VLQA: The First Comprehensive, Large, and High-Quality Vietnamese Dataset for Legal Question Answering [4.546567493379192]
We introduce the VLQA dataset, a comprehensive and high-quality resource tailored for the Vietnamese legal domain.<n>We also conduct a comprehensive statistical analysis of the dataset and evaluate its effectiveness.
arXiv Detail & Related papers (2025-07-26T16:26:50Z)
LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification [6.549338652948716]
We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features.
arXiv Detail & Related papers (2025-02-09T10:07:05Z)
LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain. LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z)
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges [4.548047308860141]
Natural Language Processing is revolutionizing the way legal professionals and laypersons operate in the legal field. This survey follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, reviewing 148 studies, with a final selection of 127 after manual filtering. It explores foundational concepts related to Natural Language Processing in the legal domain.
arXiv Detail & Related papers (2024-10-25T01:17:02Z)
InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z)
Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI. Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems. Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z)
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support [18.810320088441678]
This work introduces a novel NLP benchmark for the legal domain. It challenges LLMs in five key dimensions: processing emphlong documents (up to 50K tokens), using emphdomain-specific knowledge (embodied in legal texts) and emphmultilingual understanding (covering five languages) Our benchmark contains diverse datasets from the Swiss legal system, allowing for a comprehensive study of the underlying non-English, inherently multilingual legal system.
arXiv Detail & Related papers (2023-06-15T16:19:15Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z)
Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law [7.366081387295463]
We re-train two popular legal PLMs, LegalBERT and CaseLawBERT, on Indian legal data, as well as train a model from scratch with a vocabulary based on Indian legal text. We observe our approach not only enhances performance on the new domain (Indian texts) but also over the original domain (European and UK texts)
arXiv Detail & Related papers (2022-09-13T15:01:11Z)
Indian Legal Text Summarization: A Text Normalisation-based Approach [0.0]
There are more than 4 crore cases outstanding in the Indian court system. Many state-theart models for text summarization have emerged as machine learning has progressed. domain-independent models don't do well with legal texts. Authors have proposed a methodology for normalising legal texts in the Indian context.
arXiv Detail & Related papers (2022-06-13T15:16:50Z)
Indian Legal NLP Benchmarks : A Survey [0.0]
There is a need to create separate Natural Language Processing benchmarks for Indian Legal Text. This will spur innovation in applications of Natural language Processing for Indian Legal Text.
arXiv Detail & Related papers (2021-07-13T13:10:10Z)
Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding. We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z)
How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence [81.04070052740596]
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain. This paper introduces the history, the current state, and the future directions of research in LegalAI.
arXiv Detail & Related papers (2020-04-25T14:45:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.