LegalBench: A Collaboratively Built Benchmark for Measuring Legal
Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2308.11462v1
- Date: Sun, 20 Aug 2023 22:08:03 GMT
- Title: LegalBench: A Collaboratively Built Benchmark for Measuring Legal
Reasoning in Large Language Models
- Authors: Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher R\'e, Adam
Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon,
Daniel N. Rockmore, Diego Zambrano, Dmitry Talisman, Enam Hoque, Faiz Surani,
Frank Fagan, Galit Sarfaty, Gregory M. Dickinson, Haggai Porat, Jason
Hegland, Jessica Wu, Joe Nudell, Joel Niklaus, John Nay, Jonathan H. Choi,
Kevin Tobia, Margaret Hagan, Megan Ma, Michael Livermore, Nikon Rasumov-Rahe,
Nils Holzenberger, Noam Kolt, Peter Henderson, Sean Rehaag, Sharad Goel,
Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, and Zehua Li
- Abstract summary: LegalBench is a benchmark consisting of 162 tasks covering six different types of legal reasoning.
This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.
- Score: 15.98468948605927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of large language models (LLMs) and their adoption by the legal
community has given rise to the question: what types of legal reasoning can
LLMs perform? To enable greater study of this question, we present LegalBench:
a collaboratively constructed legal reasoning benchmark consisting of 162 tasks
covering six different types of legal reasoning. LegalBench was built through
an interdisciplinary process, in which we collected tasks designed and
hand-crafted by legal professionals. Because these subject matter experts took
a leading role in construction, tasks either measure legal reasoning
capabilities that are practically useful, or measure reasoning skills that
lawyers find interesting. To enable cross-disciplinary conversations about LLMs
in the law, we additionally show how popular legal frameworks for describing
legal reasoning -- which distinguish between its many forms -- correspond to
LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary.
This paper describes LegalBench, presents an empirical evaluation of 20
open-source and commercial LLMs, and illustrates the types of research
explorations LegalBench enables.
Related papers
- Legal Evalutions and Challenges of Large Language Models [42.51294752406578]
We use the OPENAI o1 model as a case study to evaluate the performance of large models in applying legal provisions.
We compare current state-of-the-art LLMs, including open-source, closed-source, and legal-specific models trained specifically for the legal domain.
arXiv Detail & Related papers (2024-11-15T12:23:12Z) - Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration [27.047809869136458]
Large Language Models (LLMs) could struggle to fully understand legal theories and perform legal reasoning tasks.
We introduce a challenging task (confusing charge prediction) to better evaluate LLMs' understanding of legal theories and reasoning capabilities.
We also propose a novel framework: Multi-Agent framework for improving complex Legal Reasoning capability.
arXiv Detail & Related papers (2024-10-03T14:15:00Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - A Comprehensive Evaluation of Large Language Models on Legal Judgment
Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications.
Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks.
We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z) - LAiW: A Chinese Legal Large Language Models Benchmark [17.66376880475554]
General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI.
We are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice.
arXiv Detail & Related papers (2023-10-09T11:19:55Z) - LawBench: Benchmarking Legal Knowledge of Large Language Models [35.2812008533622]
Large language models (LLMs) have demonstrated strong capabilities in various aspects.
It is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks.
LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels.
arXiv Detail & Related papers (2023-09-28T09:35:59Z) - Large Language Models as Tax Attorneys: A Case Study in Legal
Capabilities Emergence [5.07013500385659]
This paper explores Large Language Models' (LLMs) capabilities in applying tax law.
Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release.
Findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels.
arXiv Detail & Related papers (2023-06-12T12:40:48Z) - SAILER: Structure-aware Pre-trained Language Model for Legal Case
Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system.
Most existing language models have difficulty understanding the long-distance dependencies between different structures.
We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z) - A Short Survey of Viewing Large Language Models in Legal Aspect [0.0]
Large language models (LLMs) have transformed many fields, including natural language processing, computer vision, and reinforcement learning.
The integration of LLMs into the legal field has also raised several legal problems, including privacy concerns, bias, and explainability.
arXiv Detail & Related papers (2023-03-16T08:01:22Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z) - How Does NLP Benefit Legal System: A Summary of Legal Artificial
Intelligence [81.04070052740596]
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain.
This paper introduces the history, the current state, and the future directions of research in LegalAI.
arXiv Detail & Related papers (2020-04-25T14:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.