LAiW: A Chinese Legal Large Language Models Benchmark
- URL: http://arxiv.org/abs/2310.05620v2
- Date: Sun, 18 Feb 2024 05:36:14 GMT
- Title: LAiW: A Chinese Legal Large Language Models Benchmark
- Authors: Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie,
Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang
- Abstract summary: General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI.
We are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice.
- Score: 17.66376880475554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General and legal domain LLMs have demonstrated strong performance in various
tasks of LegalAI. However, the current evaluations of these LLMs in LegalAI are
defined by the experts of computer science, lacking consistency with the logic
of legal practice, making it difficult to judge their practical capabilities.
To address this challenge, we are the first to build the Chinese legal LLMs
benchmark LAiW, based on the logic of legal practice. To align with the
thinking process of legal experts and legal practice (syllogism), we divide the
legal capabilities of LLMs from easy to difficult into three levels: basic
information retrieval, legal foundation inference, and complex legal
application. Each level contains multiple tasks to ensure a comprehensive
evaluation. Through automated evaluation of current general and legal domain
LLMs on our benchmark, we indicate that these LLMs may not align with the logic
of legal practice. LLMs seem to be able to directly acquire complex legal
application capabilities but perform poorly in some basic tasks, which may pose
obstacles to their practical application and acceptance by legal experts. To
further confirm the complex legal application capabilities of current LLMs in
legal application scenarios, we also incorporate human evaluation with legal
experts. The results indicate that while LLMs may demonstrate strong
performance, they still require reinforcement of legal logic.
Related papers
- Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration [27.047809869136458]
Large Language Models (LLMs) could struggle to fully understand legal theories and perform legal reasoning tasks.
We introduce a challenging task (confusing charge prediction) to better evaluate LLMs' understanding of legal theories and reasoning capabilities.
We also propose a novel framework: Multi-Agent framework for improving complex Legal Reasoning capability.
arXiv Detail & Related papers (2024-10-03T14:15:00Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law [65.87885628115946]
Large language models (LLMs) are revolutionizing the landscapes of finance, healthcare, and law.
We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies.
We critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems.
arXiv Detail & Related papers (2024-05-02T22:43:02Z) - BLT: Can Large Language Models Handle Basic Legal Text? [44.89873147675516]
GPT-4 and Claude perform poorly on basic legal text handling.
Poor performance on benchmark casts into doubt their reliability as-is for legal practice.
Fine-tuning on training set brings even a small model to near-perfect performance.
arXiv Detail & Related papers (2023-11-16T09:09:22Z) - A Comprehensive Evaluation of Large Language Models on Legal Judgment
Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications.
Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks.
We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z) - LawBench: Benchmarking Legal Knowledge of Large Language Models [35.2812008533622]
Large language models (LLMs) have demonstrated strong capabilities in various aspects.
It is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks.
LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels.
arXiv Detail & Related papers (2023-09-28T09:35:59Z) - Large Language Models as Tax Attorneys: A Case Study in Legal
Capabilities Emergence [5.07013500385659]
This paper explores Large Language Models' (LLMs) capabilities in applying tax law.
Our experiments demonstrate emerging legal understanding capabilities, with improved performance in each subsequent OpenAI model release.
Findings indicate that LLMs, particularly when combined with prompting enhancements and the correct legal texts, can perform at high levels of accuracy but not yet at expert tax lawyer levels.
arXiv Detail & Related papers (2023-06-12T12:40:48Z) - A Short Survey of Viewing Large Language Models in Legal Aspect [0.0]
Large language models (LLMs) have transformed many fields, including natural language processing, computer vision, and reinforcement learning.
The integration of LLMs into the legal field has also raised several legal problems, including privacy concerns, bias, and explainability.
arXiv Detail & Related papers (2023-03-16T08:01:22Z) - Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents [56.40163943394202]
We release the Longformer-based pre-trained language model, named as Lawformer, for Chinese legal long documents understanding.
We evaluate Lawformer on a variety of LegalAI tasks, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
arXiv Detail & Related papers (2021-05-09T09:39:25Z) - How Does NLP Benefit Legal System: A Summary of Legal Artificial
Intelligence [81.04070052740596]
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain.
This paper introduces the history, the current state, and the future directions of research in LegalAI.
arXiv Detail & Related papers (2020-04-25T14:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.