BLT: Can Large Language Models Handle Basic Legal Text?
- URL: http://arxiv.org/abs/2311.09693v2
- Date: Wed, 28 Feb 2024 14:46:25 GMT
- Title: BLT: Can Large Language Models Handle Basic Legal Text?
- Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme
- Abstract summary: GPT-4, Claude, and PaLM 2 perform poorly at basic legal text handling.
Fine-tuning for these tasks brings even a smaller model to near-perfect performance on our test set.
- Score: 50.46167465931653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We find that the best publicly available LLMs like GPT-4, Claude, and {PaLM
2} currently perform poorly at basic legal text handling. We introduce a
benchmark consisting of tasks that lawyers and paralegals would expect LLMs to
handle zero-shot, such as looking up the text at a line of a witness deposition
or at a subsection of a contract. LLMs' poor performance on this benchmark
casts into doubt their reliability as-is for legal practice. However,
fine-tuning for these tasks brings even a smaller model to near-perfect
performance on our test set and also raises performance on a related legal
task. These results suggest that many simple behaviors needed for a domain may
not be present in foundational LLMs, without additional engagement from subject
matter experts.
Related papers
- InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Better Call GPT, Comparing Large Language Models Against Lawyers [0.0]
This paper dissects whether Large Language Models can outperform humans in accuracy, speed, and cost efficiency during contract review.
In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts.
Cost wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods.
arXiv Detail & Related papers (2024-01-24T03:53:28Z) - Large Language Models are legal but they are not: Making the case for a
powerful LegalLLM [0.0]
The recent surge of Large Language Models (LLMs) has begun to provide new opportunities to apply NLP in the legal domain.
We compare the zero-shot performance of three general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, and Falcon-180b) on the LEDGAR subset of the LexGLUE benchmark for contract provision classification.
Although the LLMs were not explicitly trained on legal data, we observe that they are still able to classify the theme correctly in most cases.
arXiv Detail & Related papers (2023-11-15T11:50:10Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - A Comprehensive Evaluation of Large Language Models on Legal Judgment
Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications.
Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks.
We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z) - LAiW: A Chinese Legal Large Language Models Benchmark [17.66376880475554]
General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI.
We are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice.
arXiv Detail & Related papers (2023-10-09T11:19:55Z) - LawBench: Benchmarking Legal Knowledge of Large Language Models [35.2812008533622]
Large language models (LLMs) have demonstrated strong capabilities in various aspects.
It is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks.
LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels.
arXiv Detail & Related papers (2023-09-28T09:35:59Z) - Large Language Models as Tax Attorneys: A Case Study in Legal
Capabilities Emergence [5.07013500385659]
This paper explores Large Language Models' (LLMs) capabilities in applying tax law.
Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release.
Findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels.
arXiv Detail & Related papers (2023-06-12T12:40:48Z) - Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data)
This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.