PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation?
- URL: http://arxiv.org/abs/2403.13681v2
- Date: Thu, 03 Oct 2024 16:01:01 GMT
- Title: PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation?
- Authors: Mitodru Niyogi, Arnab Bhattacharya,
- Abstract summary: Paramanu-Ayn is a collection of legal language models trained exclusively on Indian legal case documents.
Paramanu-Ayn was pretrained from scratch with a context size of 8192 on a single GPU for just 185 hours.
- Score: 3.9018931027384056
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present Paramanu-Ayn, a collection of legal language models trained exclusively on Indian legal case documents. This 97-million-parameter Auto-Regressive (AR) decoder-only model was pretrained from scratch with a context size of 8192 on a single GPU for just 185 hours, achieving an efficient MFU of 41.35. We also developed a legal domain specialized BPE tokenizer. We evaluated our model using perplexity and zero-shot tasks: case judgment prediction with explanation and abstractive case summarization. Paramanu-Ayn outperformed Llama-2 7B and Gemini-Pro in case judgment prediction with explanation task on test accuracy by nearly 2 percentage points, despite being 72 times smaller. In zero-shot abstractive summarization, it surpassed decoder-only LLMs generating fixed-length summaries (5000 tokens) by over 10 percentage points in BLEU and METEOR metrics, and by nearly 4 percentage points in BERTScore. Further evaluations on zero-shot commonsense and mathematical benchmarks showed that Paramanu-Ayn excelled despite being trained exclusively on legal documents, outperforming Llama-1, Llama-2, and Falcon on AGIEVAL-AQuA-RAT and AGIEVAL-SAT-Math tasks. We also instruction-tuned our model on 10,763 diverse legal tasks, including legal clause generation, legal drafting, case summarization, etc. The Paramanu-Ayn-instruct model scored above 8 out of 10 in clarity, relevance, completeness, and legal reasoning metrics by GPT-3.5-Turbo. We found that our models, were able to learn drafting knowledge and generalize to draft legal contracts and legal clauses with limited instruction-tuning. Hence, we conclude that for a strong domain-specialized generative language model (such as legal), domain specialized pretraining from scratch is more cost effective, environmentally friendly, and remains competitive with larger models or even better than adapting LLMs for legal domain tasks.
Related papers
- Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, and Beyond [29.03425022434831]
Test-Time Scaling Large Language Models (LLMs) have demonstrated exceptional capabilities across various domains and tasks, particularly in reasoning.
We present a preliminary evaluation of LLMs in various legal scenarios, covering both Chinese and English legal tasks.
Our findings indicate that, despite DeepSeek-R1 and OpenAI o1 being among the most powerful models, their legal reasoning capabilities are still lacking.
arXiv Detail & Related papers (2025-03-20T11:14:39Z) - NyayaAnumana & INLegalLlama: The Largest Indian Legal Judgment Prediction Dataset and Specialized Language Model for Enhanced Decision Analysis [5.790242888372048]
This paper introduces NyayaAnumana, the largest and most diverse corpus of Indian legal cases compiled for legal judgment prediction (LJP)
NyayaAnumana includes a wide range of cases from the Supreme Court, High Courts, Tribunal Courts, District Courts, and Daily Orders.
In addition to the dataset, we present INLegalLlama, a domain-specific generative large language model (LLM) tailored to the intricacies of the Indian legal system.
arXiv Detail & Related papers (2024-12-11T13:50:17Z) - Lawma: The Power of Specialization for Legal Tasks [18.45967769381101]
We study 260 legal text classification tasks, nearly all new to the machine learning community.
A lightly fine-tuned Llama 3 model vastly outperforms GPT-4 on almost all tasks, typically by double-digit percentage points.
We find that larger models respond better to fine-tuning than smaller models.
arXiv Detail & Related papers (2024-07-23T16:23:04Z) - LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain [47.001169623840354]
We aggregate 58 annotated legal datasets and write instructions for each, creating LawInstruct.
LawInstruct covers 17 global jurisdictions, 24 languages and a total of 12M examples across diverse tasks such as legal QA, summarization of court cases, and legal argument mining.
We find that legal-specific instruction tuning on Flan-T5 - yielding FLawN-T5 - improves performance on LegalBench across all model sizes.
arXiv Detail & Related papers (2024-04-02T17:33:34Z) - Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement [3.537369004801589]
We study the classification of legal reasoning according to jurisprudential philosophy.
We use a novel dataset of historical United States Supreme Court opinions annotated by a team of domain experts.
We find that generative models perform poorly when given instructions equal to the instructions presented to human annotators.
arXiv Detail & Related papers (2023-10-27T19:27:59Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - Automated Refugee Case Analysis: An NLP Pipeline for Supporting Legal
Practitioners [0.0]
We introduce an end-to-end pipeline for retrieving, processing, and extracting targeted information from legal cases.
We investigate an under-studied legal domain with a case study on refugee law in Canada.
arXiv Detail & Related papers (2023-05-24T19:37:23Z) - Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification.
Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks.
More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z) - Toward Adversarial Training on Contextualized Language Representation [78.39805974043321]
This paper investigates adversarial training (AT) from the perspective of the contextualized language representation outputted by PLM encoders.
We propose textitContextualized representation-Adversarial Training (CreAT) in which the attack is explicitly optimized to deviate the contextualized representation of the encoder.
CreAT produces consistent performance gains on a wider range of tasks and is proven to be more effective for language pre-training where only the encoder part is kept for downstream tasks.
arXiv Detail & Related papers (2023-05-08T08:56:51Z) - Understand Legal Documents with Contextualized Large Language Models [16.416510744265086]
We present our systems for SemEval-2023 Task 6: understanding legal texts.
We first develop the Legal-BERT-HSLN model that considers the comprehensive context information in both intra- and inter-sentence levels.
We then train a Legal-LUKE model, which is legal-contextualized and entity-aware, to recognize legal entities.
arXiv Detail & Related papers (2023-03-21T18:48:11Z) - Bag of Tricks for Effective Language Model Pretraining and Downstream
Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard.
GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z) - Toward Efficient Language Model Pretraining and Downstream Adaptation
via Self-Evolution: A Case Study on SuperGLUE [203.65227947509933]
This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
arXiv Detail & Related papers (2022-12-04T15:36:18Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z) - When Does Pretraining Help? Assessing Self-Supervised Learning for Law
and the CaseHOLD Dataset [2.0924876102146714]
We present a new dataset comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case.
We show that domain pretraining may be warranted when the task exhibits sufficient similarity to the pretraining corpus.
Our findings inform when researchers should engage resource-intensive pretraining and show that Transformer-based architectures, too, learn embeddings suggestive of distinct legal language.
arXiv Detail & Related papers (2021-04-18T00:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.