CitaLaw: Enhancing LLM with Citations in Legal Domain
- URL: http://arxiv.org/abs/2412.14556v1
- Date: Thu, 19 Dec 2024 06:14:20 GMT
- Title: CitaLaw: Enhancing LLM with Citations in Legal Domain
- Authors: Kepu Zhang, Weijie Yu, Sunhao Dai, Jun Xu,
- Abstract summary: We propose CitaLaw, the first benchmark designed to evaluate LLMs' ability to produce legally sound responses with appropriate citations.
CitaLaw features a diverse set of legal questions for both laypersons and practitioners, paired with a comprehensive corpus of law articles and precedent cases as a reference pool.
- Score: 5.249003454314636
- License:
- Abstract: In this paper, we propose CitaLaw, the first benchmark designed to evaluate LLMs' ability to produce legally sound responses with appropriate citations. CitaLaw features a diverse set of legal questions for both laypersons and practitioners, paired with a comprehensive corpus of law articles and precedent cases as a reference pool. This framework enables LLM-based systems to retrieve supporting citations from the reference corpus and align these citations with the corresponding sentences in their responses. Moreover, we introduce syllogism-inspired evaluation methods to assess the legal alignment between retrieved references and LLM-generated responses, as well as their consistency with user questions. Extensive experiments on 2 open-domain and 7 legal-specific LLMs demonstrate that integrating legal references substantially enhances response quality. Furthermore, our proposed syllogism-based evaluation method exhibits strong agreement with human judgments.
Related papers
- LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.
LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - Methods for Legal Citation Prediction in the Age of LLMs: An Australian Law Case Study [9.30538764385435]
We focus on the problem of legal citation prediction within the Australian law context, where correctly identifying and citing relevant legislations or precedents is critical.
Our findings indicate that domain-specific pre-training alone is insufficient for achieving satisfactory citation accuracy even after law-specialised pre-training.
In contrast, instruction tuning on our task-specific dataset dramatically boosts performance reaching the best results across all settings.
arXiv Detail & Related papers (2024-12-09T07:46:14Z) - Legal Evalutions and Challenges of Large Language Models [42.51294752406578]
We use the OPENAI o1 model as a case study to evaluate the performance of large models in applying legal provisions.
We compare current state-of-the-art LLMs, including open-source, closed-source, and legal-specific models trained specifically for the legal domain.
arXiv Detail & Related papers (2024-11-15T12:23:12Z) - ELLA: Empowering LLMs for Interpretable, Accurate and Informative Legal Advice [26.743016561520506]
ELLA is a tool for bf Empowering bf LLMs for interpretable, accurate, and informative bf Legal bf Advice.
arXiv Detail & Related papers (2024-08-13T18:12:00Z) - Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
RAG has been widely adopted to enhance Large Language Models (LLMs)
Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG.
This paper proposes a fine-grained ATG method called ReClaim(Refer & Claim), which alternates the generation of references and answers step by step.
arXiv Detail & Related papers (2024-07-01T20:47:47Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation [75.81096662788254]
Large Language Models (LLMs) are scalable and economical evaluators.
The question of how reliable these evaluators are has emerged as a crucial research question.
We propose Decompose and Aggregate, which breaks down the evaluation process into different stages based on pedagogical practices.
arXiv Detail & Related papers (2024-05-24T08:12:30Z) - (A)I Am Not a Lawyer, But...: Engaging Legal Experts towards Responsible LLM Policies for Legal Advice [8.48013392781081]
Large language models (LLMs) are increasingly capable of providing users with advice in a wide range of professional domains, including legal advice.
We conducted workshops with 20 legal experts using methods inspired by case-based reasoning.
Our findings reveal novel legal considerations, such as unauthorized practice of law, confidentiality, and liability for inaccurate advice.
arXiv Detail & Related papers (2024-02-02T19:35:34Z) - Effective Large Language Model Adaptation for Improved Grounding and Citation Generation [48.07830615309543]
This paper focuses on improving large language models (LLMs) by grounding their responses in retrieved passages and by providing citations.
We propose a new framework, AGREE, that improves the grounding from a holistic perspective.
Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents.
arXiv Detail & Related papers (2023-11-16T03:22:25Z) - A Comprehensive Evaluation of Large Language Models on Legal Judgment
Prediction [60.70089334782383]
Large language models (LLMs) have demonstrated great potential for domain-specific applications.
Recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks.
We design practical baseline solutions based on LLMs and test on the task of legal judgment prediction.
arXiv Detail & Related papers (2023-10-18T07:38:04Z) - LAiW: A Chinese Legal Large Language Models Benchmark [17.66376880475554]
General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI.
We are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice.
arXiv Detail & Related papers (2023-10-09T11:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.