Related papers: BLT: Can Large Language Models Handle Basic Legal Text?

BLT: Can Large Language Models Handle Basic Legal Text?

URL: http://arxiv.org/abs/2311.09693v2
Date: Wed, 28 Feb 2024 14:46:25 GMT
Title: BLT: Can Large Language Models Handle Basic Legal Text?
Authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme
Abstract summary: GPT-4, Claude, and PaLM 2 perform poorly at basic legal text handling. Fine-tuning for these tasks brings even a smaller model to near-perfect performance on our test set.
Score: 50.46167465931653
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We find that the best publicly available LLMs like GPT-4, Claude, and {PaLM 2} currently perform poorly at basic legal text handling. We introduce a benchmark consisting of tasks that lawyers and paralegals would expect LLMs to handle zero-shot, such as looking up the text at a line of a witness deposition or at a subsection of a contract. LLMs' poor performance on this benchmark casts into doubt their reliability as-is for legal practice. However, fine-tuning for these tasks brings even a smaller model to near-perfect performance on our test set and also raises performance on a related legal task. These results suggest that many simple behaviors needed for a domain may not be present in foundational LLMs, without additional engagement from subject matter experts.

Related papers

Better Benchmarking LLMs for Zero-Shot Dependency Parsing [18.079016557290338]
This paper studies state-of-the-art open-weight LLMs on the task by comparing them to baselines that do not have access to the input sentence. The results show that most of the tested LLMs cannot outperform the best uninformed baselines.
arXiv Detail & Related papers (2025-02-28T09:08:57Z)
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings [17.109522466982476]
Large Language Models (LLMs) have been shown to perform well for many downstream tasks. This paper investigates how well LLMs perform in the task of forecasting corporate credit ratings.
arXiv Detail & Related papers (2024-07-24T20:30:55Z)
InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws. We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries. InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
Better Call GPT, Comparing Large Language Models Against Lawyers [0.0]
This paper dissects whether Large Language Models can outperform humans in accuracy, speed, and cost efficiency during contract review. In speed, LLMs complete reviews in mere seconds, eclipsing the hours required by their human counterparts. Cost wise, LLMs operate at a fraction of the price, offering a staggering 99.97 percent reduction in cost over traditional methods.
arXiv Detail & Related papers (2024-01-24T03:53:28Z)
Large Language Models are legal but they are not: Making the case for a powerful LegalLLM [0.0]
The recent surge of Large Language Models (LLMs) has begun to provide new opportunities to apply NLP in the legal domain. We compare the zero-shot performance of three general-purpose LLMs (ChatGPT-20b, LLaMA-2-70b, and Falcon-180b) on the LEDGAR subset of the LexGLUE benchmark for contract provision classification. Although the LLMs were not explicitly trained on legal data, we observe that they are still able to classify the theme correctly in most cases.
arXiv Detail & Related papers (2023-11-15T11:50:10Z)
LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents. We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question. Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z)
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications. Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks. We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z)
LawBench: Benchmarking Legal Knowledge of Large Language Models [35.2812008533622]
Large language models (LLMs) have demonstrated strong capabilities in various aspects. It is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels.
arXiv Detail & Related papers (2023-09-28T09:35:59Z)
Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data) This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
Legal Prompt Engineering for Multilingual Legal Judgement Prediction [2.539568419434224]
Legal Prompt Engineering (LPE) or Legal Prompting is a process to guide and assist a large language model (LLM) with performing a natural legal language processing skill. We investigate the performance of zero-shot LPE for given facts in case-texts from the European Court of Human Rights (in English) and the Federal Supreme Court of Switzerland (in German, French and Italian)
arXiv Detail & Related papers (2022-12-05T12:17:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.