Large Language Models are legal but they are not: Making the case for a
powerful LegalLLM
- URL: http://arxiv.org/abs/2311.08890v1
- Date: Wed, 15 Nov 2023 11:50:10 GMT
- Title: Large Language Models are legal but they are not: Making the case for a
powerful LegalLLM
- Authors: Thanmay Jayakumar, Fauzan Farooqui, Luqman Farooqui
- Abstract summary: The recent surge of Large Language Models (LLMs) has begun to provide new opportunities to apply NLP in the legal domain.
We compare the zero-shot performance of three general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, and Falcon-180b) on the LEDGAR subset of the LexGLUE benchmark for contract provision classification.
Although the LLMs were not explicitly trained on legal data, we observe that they are still able to classify the theme correctly in most cases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Realizing the recent advances in Natural Language Processing (NLP) to the
legal sector poses challenging problems such as extremely long sequence
lengths, specialized vocabulary that is usually only understood by legal
professionals, and high amounts of data imbalance. The recent surge of Large
Language Models (LLMs) has begun to provide new opportunities to apply NLP in
the legal domain due to their ability to handle lengthy, complex sequences.
Moreover, the emergence of domain-specific LLMs has displayed extremely
promising results on various tasks. In this study, we aim to quantify how
general LLMs perform in comparison to legal-domain models (be it an LLM or
otherwise). Specifically, we compare the zero-shot performance of three
general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, and Falcon-180b) on the LEDGAR
subset of the LexGLUE benchmark for contract provision classification. Although
the LLMs were not explicitly trained on legal data, we observe that they are
still able to classify the theme correctly in most cases. However, we find that
their mic-F1/mac-F1 performance is up to 19.2/26.8\% lesser than smaller models
fine-tuned on the legal domain, thus underscoring the need for more powerful
legal-domain LLMs.
Related papers
- InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - BLT: Can Large Language Models Handle Basic Legal Text? [44.89873147675516]
GPT-4 and Claude perform poorly on basic legal text handling.
Poor performance on benchmark casts into doubt their reliability as-is for legal practice.
Fine-tuning on training set brings even a small model to near-perfect performance.
arXiv Detail & Related papers (2023-11-16T09:09:22Z) - A Comprehensive Evaluation of Large Language Models on Legal Judgment
Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications.
Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks.
We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z) - Precedent-Enhanced Legal Judgment Prediction with LLM and Domain-Model
Collaboration [52.57055162778548]
Legal Judgment Prediction (LJP) has become an increasingly crucial task in Legal AI.
Precedents are the previous legal cases with similar facts, which are the basis for the judgment of the subsequent case in national legal systems.
Recent advances in deep learning have enabled a variety of techniques to be used to solve the LJP task.
arXiv Detail & Related papers (2023-10-13T16:47:20Z) - LawBench: Benchmarking Legal Knowledge of Large Language Models [35.2812008533622]
Large language models (LLMs) have demonstrated strong capabilities in various aspects.
It is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks.
LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels.
arXiv Detail & Related papers (2023-09-28T09:35:59Z) - Legal Prompt Engineering for Multilingual Legal Judgement Prediction [2.539568419434224]
Legal Prompt Engineering (LPE) or Legal Prompting is a process to guide and assist a large language model (LLM) with performing a natural legal language processing skill.
We investigate the performance of zero-shot LPE for given facts in case-texts from the European Court of Human Rights (in English) and the Federal Supreme Court of Switzerland (in German, French and Italian)
arXiv Detail & Related papers (2022-12-05T12:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.