An Evaluation Framework for Legal Document Summarization
- URL: http://arxiv.org/abs/2205.08478v1
- Date: Tue, 17 May 2022 16:42:03 GMT
- Title: An Evaluation Framework for Legal Document Summarization
- Authors: Ankan Mullick, Abhilash Nandy, Manav Nitin Kapadnis, Sohan Patnaik, R
Raghav, Roshni Kar
- Abstract summary: A law practitioner has to go through numerous lengthy legal case proceedings for their practices of various categories, such as land dispute, corruption, etc.
It is important to summarize these documents, and ensure that summaries contain phrases with intent matching the category of the case.
We propose an automated intent-based summarization metric, which shows a better agreement with human evaluation as compared to other automated metrics like BLEU, ROUGE-L etc.
- Score: 1.9709122688953327
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A law practitioner has to go through numerous lengthy legal case proceedings
for their practices of various categories, such as land dispute, corruption,
etc. Hence, it is important to summarize these documents, and ensure that
summaries contain phrases with intent matching the category of the case. To the
best of our knowledge, there is no evaluation metric that evaluates a summary
based on its intent. We propose an automated intent-based summarization metric,
which shows a better agreement with human evaluation as compared to other
automated metrics like BLEU, ROUGE-L etc. in terms of human satisfaction. We
also curate a dataset by annotating intent phrases in legal documents, and show
a proof of concept as to how this system can be automated. Additionally, all
the code and data to generate reproducible results is available on Github.
Related papers
- JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance.
We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods.
In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - Evaluating Code Summarization Techniques: A New Metric and an Empirical
Characterization [16.127739014966487]
We investigate the complementarity of different types of metrics in capturing the quality of a generated summary.
We present a new metric based on contrastive learning to capture said aspect.
arXiv Detail & Related papers (2023-12-24T13:12:39Z) - OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization [52.720711541731205]
We present OpinSummEval, a dataset comprising human judgments and outputs from 14 opinion summarization models.
Our findings indicate that metrics based on neural networks generally outperform non-neural ones.
arXiv Detail & Related papers (2023-10-27T13:09:54Z) - MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness.
Existing SCR datasets only focus on the fact description section when judging the similarity between cases.
We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z) - LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation [28.438103177230477]
We evaluate the efficacy of automatic metrics for assessing the factual consistency of long document text summarisation.
We propose a new evaluation framework, LongDocFACTScore, which is suitable for evaluating long document summarisation data sets.
arXiv Detail & Related papers (2023-09-21T19:54:54Z) - Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)
We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.
Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z) - Court Judgement Labeling on HKLII [17.937279252256594]
HKLII has served as the repository of legal documents in Hong Kong for a decade.
Our team aims to incorporate NLP techniques into the website to make it more intelligent.
arXiv Detail & Related papers (2022-08-03T06:32:16Z) - Fine-grained Intent Classification in the Legal Domain [2.088409822555567]
We introduce a dataset of 93 legal documents, belonging to the case categories of either Murder, Land Dispute, Robbery, or Corruption.
We annotate fine-grained intents for each such phrase to enable a deeper understanding of the case for a reader.
We analyze the performance of several transformer-based models in automating the process of extracting intent phrases.
arXiv Detail & Related papers (2022-05-06T23:57:17Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Automating Document Classification with Distant Supervision to Increase
the Efficiency of Systematic Reviews [18.33687903724145]
Well-done systematic reviews are expensive, time-demanding, and labor-intensive.
We propose an automatic document classification approach to significantly reduce the effort in reviewing documents.
arXiv Detail & Related papers (2020-12-09T22:45:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.