Pre-trained Language Models for the Legal Domain: A Case Study on Indian
Law
- URL: http://arxiv.org/abs/2209.06049v5
- Date: Mon, 15 May 2023 10:02:14 GMT
- Title: Pre-trained Language Models for the Legal Domain: A Case Study on Indian
Law
- Authors: Shounak Paul, Arpan Mandal, Pawan Goyal and Saptarshi Ghosh
- Abstract summary: We re-train two popular legal PLMs, LegalBERT and CaseLawBERT, on Indian legal data, as well as train a model from scratch with a vocabulary based on Indian legal text.
We observe our approach not only enhances performance on the new domain (Indian texts) but also over the original domain (European and UK texts)
- Score: 7.366081387295463
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: NLP in the legal domain has seen increasing success with the emergence of
Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text.
PLMs trained over European and US legal text are available publicly; however,
legal text from other domains (countries), such as India, have a lot of
distinguishing characteristics. With the rapidly increasing volume of Legal NLP
applications in various countries, it has become necessary to pre-train such
LMs over legal text of other countries as well. In this work, we attempt to
investigate pre-training in the Indian legal domain. We re-train (continue
pre-training) two popular legal PLMs, LegalBERT and CaseLawBERT, on Indian
legal data, as well as train a model from scratch with a vocabulary based on
Indian legal text. We apply these PLMs over three benchmark legal NLP tasks --
Legal Statute Identification from facts, Semantic Segmentation of Court
Judgment Documents, and Court Appeal Judgment Prediction -- over both Indian
and non-Indian (EU, UK) datasets. We observe that our approach not only
enhances performance on the new domain (Indian texts) but also over the
original domain (European and UK texts). We also conduct explainability
experiments for a qualitative comparison of all these different PLMs.
Related papers
- IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning [16.12863746776168]
Legal systems worldwide are inundated with exponential growth in cases and documents.
There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents.
This paper proposes IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning.
arXiv Detail & Related papers (2024-07-07T14:55:04Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - Are Models Trained on Indian Legal Data Fair? [20.162205920441895]
We present an initial investigation of fairness from the Indian perspective in the legal domain.
We show that a decision tree model trained for the bail prediction task has an overall fairness disparity of 0.237 between input features associated with Hindus and Muslims.
arXiv Detail & Related papers (2023-03-13T16:20:33Z) - Indian Legal NLP Benchmarks : A Survey [0.0]
There is a need to create separate Natural Language Processing benchmarks for Indian Legal Text.
This will spur innovation in applications of Natural language Processing for Indian Legal Text.
arXiv Detail & Related papers (2021-07-13T13:10:10Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z) - When Does Pretraining Help? Assessing Self-Supervised Learning for Law
and the CaseHOLD Dataset [2.0924876102146714]
We present a new dataset comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case.
We show that domain pretraining may be warranted when the task exhibits sufficient similarity to the pretraining corpus.
Our findings inform when researchers should engage resource-intensive pretraining and show that Transformer-based architectures, too, learn embeddings suggestive of distinct legal language.
arXiv Detail & Related papers (2021-04-18T00:57:16Z) - LEGAL-BERT: The Muppets straight out of Law School [52.53830441117363]
We explore approaches for applying BERT models to downstream legal tasks, evaluating on multiple datasets.
Our findings indicate that the previous guidelines for pre-training and fine-tuning, often blindly followed, do not always generalize well in the legal domain.
We release LEGAL-BERT, a family of BERT models intended to assist legal NLP research, computational law, and legal technology applications.
arXiv Detail & Related papers (2020-10-06T09:06:07Z) - How Does NLP Benefit Legal System: A Summary of Legal Artificial
Intelligence [81.04070052740596]
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain.
This paper introduces the history, the current state, and the future directions of research in LegalAI.
arXiv Detail & Related papers (2020-04-25T14:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.