Related papers: Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents

Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents

URL: http://arxiv.org/abs/2106.15876v1
Date: Wed, 30 Jun 2021 08:06:15 GMT
Title: Incorporating Domain Knowledge for Extractive Summarization of Legal Case Documents
Authors: Paheli Bhattacharya and Soham Poddar and Koustav Rudra and Kripabandhu Ghosh and Saptarshi Ghosh
Abstract summary: We propose an unsupervised summarization algorithm DELSumm for summarizing legal case documents. Our proposed algorithm outperforms several supervised summarization models that are trained over thousands of document-summary pairs.
Score: 7.6340456946456605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic summarization of legal case documents is an important and practical challenge. Apart from many domain-independent text summarization algorithms that can be used for this purpose, several algorithms have been developed specifically for summarizing legal case documents. However, most of the existing algorithms do not systematically incorporate domain knowledge that specifies what information should ideally be present in a legal case document summary. To address this gap, we propose an unsupervised summarization algorithm DELSumm which is designed to systematically incorporate guidelines from legal experts into an optimization setup. We conduct detailed experiments over case documents from the Indian Supreme Court. The experiments show that our proposed unsupervised method outperforms several strong baselines in terms of ROUGE scores, including both general summarization algorithms and legal-specific ones. In fact, though our proposed algorithm is unsupervised, it outperforms several supervised summarization models that are trained over thousands of document-summary pairs.

Related papers

Labeling Case Similarity based on Co-Citation of Legal Articles in Judgment Documents with Empirical Dispute-Based Evaluation [0.9902389530203038]
We propose a new approach leveraging the co-citation of legal articles within cases to establish similarity and enable algorithmic annotation. We employ a system that recommends similar cases based on plaintiffs' accusations, defendants' rebuttals, and points of disputes. The evaluation demonstrates that the recommender, with finetuned text embedding models and a reasonable BiLSTM module can recommend labor cases whose similarity was measured by the co-citation of the legal articles.
arXiv Detail & Related papers (2025-04-29T00:26:37Z)
A Law Reasoning Benchmark for LLM with Tree-Organized Structures including Factum Probandum, Evidence and Experiences [76.73731245899454]
We propose a transparent law reasoning schema enriched with hierarchical factum probandum, evidence, and implicit experience. Inspired by this schema, we introduce the challenging task, which takes a textual case description and outputs a hierarchical structure justifying the final decision. This benchmark paves the way for transparent and accountable AI-assisted law reasoning in the Intelligent Court''
arXiv Detail & Related papers (2025-03-02T10:26:54Z)
AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction [56.797874973414636]
AnnoCaseLaw is a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases. Our dataset lays the groundwork for more human-aligned, explainable Legal Judgment Prediction models. Results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult.
arXiv Detail & Related papers (2025-02-28T19:14:48Z)
CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation [22.98779736851499]
We introduce CaseGen, the benchmark for multi-stage legal case documents generation in the Chinese legal domain. CaseGen is based on 500 real case samples annotated by legal experts and covers seven essential case sections. It supports four key tasks: drafting defense statements, writing trial facts, composing legal reasoning, and generating judgment results.
arXiv Detail & Related papers (2025-02-25T08:03:32Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
LawLLM: Law Large Language Model for the US Legal System [43.13850456765944]
We introduce the Law Large Language Model (LawLLM), a multi-task model specifically designed for the US legal domain. LawLLM excels at Similar Case Retrieval (SCR), Precedent Case Recommendation (PCR), and Legal Judgment Prediction (LJP) We propose customized data preprocessing techniques for each task that transform raw legal data into a trainable format.
arXiv Detail & Related papers (2024-07-27T21:51:30Z)
DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval. We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z)
A Deep Learning-Based System for Automatic Case Summarization [2.9141777969894966]
This paper presents a deep learning-based system for efficient automatic case summarization. The system offers both supervised and unsupervised methods to generate concise and relevant summaries of lengthy legal case documents. Future work will focus on refining summarization techniques and exploring the application of our methods to other types of legal texts.
arXiv Detail & Related papers (2023-12-13T01:18:10Z)
MUSER: A Multi-View Similar Case Retrieval Dataset [65.36779942237357]
Similar case retrieval (SCR) is a representative legal AI application that plays a pivotal role in promoting judicial fairness. Existing SCR datasets only focus on the fact description section when judging the similarity between cases. We present M, a similar case retrieval dataset based on multi-view similarity measurement and comprehensive legal element with sentence-level legal element annotations.
arXiv Detail & Related papers (2023-10-24T08:17:11Z)
An Intent Taxonomy of Legal Case Retrieval [43.22489520922202]
Legal case retrieval is a special Information Retrieval(IR) task focusing on legal case documents. We present a novel hierarchical intent taxonomy of legal case retrieval. We reveal significant differences in user behavior and satisfaction under different search intents in legal case retrieval.
arXiv Detail & Related papers (2023-07-25T07:27:32Z)
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval [75.05173891207214]
Legal case retrieval plays a core role in the intelligent legal system. Most existing language models have difficulty understanding the long-distance dependencies between different structures. We propose a new Structure-Aware pre-traIned language model for LEgal case Retrieval.
arXiv Detail & Related papers (2023-04-22T10:47:01Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Computing and Exploiting Document Structure to Improve Unsupervised Extractive Summarization of Legal Case Decisions [7.99536002595393]
We propose an unsupervised graph-based ranking model that uses a reweighting algorithm to exploit document structure. Results on the Canadian Legal Case Law dataset show that our proposed method outperforms several strong baselines.
arXiv Detail & Related papers (2022-11-06T22:20:42Z)
Legal Case Document Summarization: Extractive and Abstractive Methods and their Evaluation [11.502115682980559]
Summarization of legal case judgement documents is a challenging problem in Legal NLP. Not much analyses exist on how different families of summarization models perform when applied to legal case documents.
arXiv Detail & Related papers (2022-10-14T05:43:08Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.