LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English
- URL: http://arxiv.org/abs/2410.09527v1
- Date: Sat, 12 Oct 2024 13:16:51 GMT
- Title: LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English
- Authors: T. Y. S. S. Santosh, Cornelius Weiss, Matthias Grabmair,
- Abstract summary: This work curates LexSumm, a benchmark designed for evaluating legal summarization tasks in English.
It comprises eight English legal summarization datasets, from diverse jurisdictions, such as the US, UK, EU and India.
We release LexT5, a legal oriented sequence-to-sequence model, addressing the limitation of the existing BERT-style encoder-only models in the legal domain.
- Score: 1.3723120574076126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the evolving NLP landscape, benchmarks serve as yardsticks for gauging progress. However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking generative tasks. This work curates LexSumm, a benchmark designed for evaluating legal summarization tasks in English. It comprises eight English legal summarization datasets, from diverse jurisdictions, such as the US, UK, EU and India. Additionally, we release LexT5, legal oriented sequence-to-sequence model, addressing the limitation of the existing BERT-style encoder-only models in the legal domain. We assess its capabilities through zero-shot probing on LegalLAMA and fine-tuning on LexSumm. Our analysis reveals abstraction and faithfulness errors even in summaries generated by zero-shot LLMs, indicating opportunities for further improvements. LexSumm benchmark and LexT5 model are available at https://github.com/TUMLegalTech/LexSumm-LexT5.
Related papers
- LegalAgentBench: Evaluating LLM Agents in Legal Domain [53.70993264644004]
LegalAgentBench is a benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain.
LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge.
arXiv Detail & Related papers (2024-12-23T04:02:46Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.
Currently, instruction-tuned large language models (LLMs) excel at various English tasks.
Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models [17.90483181611453]
Large language models (LLMs) have made significant progress in natural language processing tasks and demonstrate considerable potential in the legal domain.
Applying existing LLMs to legal systems without careful evaluation of their potential and limitations could pose significant risks in legal practice.
We introduce a standardized comprehensive Chinese legal benchmark LexEval.
arXiv Detail & Related papers (2024-09-30T13:44:00Z) - InternLM-Law: An Open Source Chinese Legal Large Language Model [72.2589401309848]
InternLM-Law is a specialized LLM tailored for addressing diverse legal queries related to Chinese laws.
We meticulously construct a dataset in the Chinese legal domain, encompassing over 1 million queries.
InternLM-Law achieves the highest average performance on LawBench, outperforming state-of-the-art models, including GPT-4, on 13 out of 20 subtasks.
arXiv Detail & Related papers (2024-06-21T06:19:03Z) - LawInstruct: A Resource for Studying Language Model Adaptation to the Legal Domain [47.001169623840354]
We aggregate 58 annotated legal datasets and write instructions for each, creating LawInstruct.
LawInstruct covers 17 global jurisdictions, 24 languages and a total of 12M examples across diverse tasks such as legal QA, summarization of court cases, and legal argument mining.
We find that legal-specific instruction tuning on Flan-T5 - yielding FLawN-T5 - improves performance on LegalBench across all model sizes.
arXiv Detail & Related papers (2024-04-02T17:33:34Z) - BLT: Can Large Language Models Handle Basic Legal Text? [44.89873147675516]
GPT-4 and Claude perform poorly on basic legal text handling.
Poor performance on benchmark casts into doubt their reliability as-is for legal practice.
Fine-tuning on training set brings even a small model to near-perfect performance.
arXiv Detail & Related papers (2023-11-16T09:09:22Z) - SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore [159.21914121143885]
We present SILO, a new language model that manages this risk-performance tradeoff during inference.
SILO is built by (1) training a parametric LM on Open License Corpus (OLC), a new corpus we curate with 228B tokens of public domain and permissively licensed text.
Access to the datastore greatly improves out of domain performance, closing 90% of the performance gap with an LM trained on the Pile.
arXiv Detail & Related papers (2023-08-08T17:58:15Z) - One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support [18.810320088441678]
This work introduces a novel NLP benchmark for the legal domain.
It challenges LLMs in five key dimensions: processing emphlong documents (up to 50K tokens), using emphdomain-specific knowledge (embodied in legal texts) and emphmultilingual understanding (covering five languages)
Our benchmark contains diverse datasets from the Swiss legal system, allowing for a comprehensive study of the underlying non-English, inherently multilingual legal system.
arXiv Detail & Related papers (2023-06-15T16:19:15Z) - LexGLUE: A Benchmark Dataset for Legal Language Understanding in English [15.026117429782996]
We introduce the Legal General Language Evaluation (LexGLUE) benchmark, a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks.
We also provide an evaluation and analysis of several generic and legal-oriented models demonstrating that the latter consistently offer performance improvements across multiple tasks.
arXiv Detail & Related papers (2021-10-03T10:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.