VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case
Law
- URL: http://arxiv.org/abs/2108.10120v1
- Date: Mon, 23 Aug 2021 12:41:41 GMT
- Title: VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case
Law
- Authors: Julien Rossi, Svitlana Vakulenko, Evangelos Kanoulas
- Abstract summary: This paper presents a new dataset that consists of the citation graph of court opinions.
We focus on the verbatim quotes, where the text of the original opinion is directly reused.
We introduce the task of highlight extraction as a single-document summarization task based on the citation graph.
- Score: 12.080138272647144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Citing legal opinions is a key part of legal argumentation, an expert task
that requires retrieval, extraction and summarization of information from court
decisions. The identification of legally salient parts in an opinion for the
purpose of citation may be seen as a domain-specific formulation of a highlight
extraction or passage retrieval task. As similar tasks in other domains such as
web search show significant attention and improvement, progress in the legal
domain is hindered by the lack of resources for training and evaluation.
This paper presents a new dataset that consists of the citation graph of
court opinions, which cite previously published court opinions in support of
their arguments. In particular, we focus on the verbatim quotes, i.e., where
the text of the original opinion is directly reused.
With this approach, we explain the relative importance of different text
spans of a court opinion by showcasing their usage in citations, and measuring
their contribution to the relations between opinions in the citation graph.
We release VerbCL, a large-scale dataset derived from CourtListener and
introduce the task of highlight extraction as a single-document summarization
task based on the citation graph establishing the first baseline results for
this task on the VerbCL dataset.
Related papers
- Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval [0.0]
We propose a set of submodular functions for opinion summarization.
Opinion summarization has built in it the tasks of summarization and sentiment detection.
Our functions generate summaries such as there is good correlation between document sentiment and summary sentiment along with good ROUGE score.
arXiv Detail & Related papers (2024-05-20T21:27:18Z) - DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment [55.91429725404988]
We introduce DELTA, a discriminative model designed for legal case retrieval.
We leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability.
Our approach can outperform existing state-of-the-art methods in legal case retrieval.
arXiv Detail & Related papers (2024-03-27T10:40:14Z) - Aspect-based Meeting Transcript Summarization: A Two-Stage Approach with
Weak Supervision on Sentence Classification [91.13086984529706]
Aspect-based meeting transcript summarization aims to produce multiple summaries.
Traditional summarization methods produce one summary mixing information of all aspects.
We propose a two-stage method for aspect-based meeting transcript summarization.
arXiv Detail & Related papers (2023-11-07T19:06:31Z) - Analysing the Resourcefulness of the Paragraph for Precedence Retrieval [0.1761604268733064]
We analyzed the resourcefulness of paragraph-level information in capturing similarity among judgments for improving the performance of precedence retrieval.
We found that the paragraph-level methods could capture the similarity among the judgments with only a few paragraph interactions and exhibit more discriminating power over the baseline document-level method.
arXiv Detail & Related papers (2023-07-29T08:55:38Z) - CiteCaseLAW: Citation Worthiness Detection in Caselaw for Legal
Assistive Writing [44.75251805925605]
We introduce a labeled dataset of 178M sentences for citation-worthiness detection in the legal domain from the Caselaw Access Project (CAP)
The performance of various deep learning models was examined on this novel dataset.
The domain-specific pre-trained model tends to outperform other models, with an 88% F1-score for the citation-worthiness detection task.
arXiv Detail & Related papers (2023-05-03T04:20:56Z) - Inline Citation Classification using Peripheral Context and
Time-evolving Augmentation [23.88211560188731]
We propose a new dataset, named 3Cext, which provides discourse information using the cited sentences.
We propose PeriCite, a Transformer-based deep neural network that fuses peripheral sentences and domain knowledge.
arXiv Detail & Related papers (2023-03-01T09:11:07Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Generating Fact Checking Summaries for Web Claims [8.980876474818153]
We present a neural attention-based approach that learns to establish the correctness of textual claims based on evidence in the form of text documents.
We show the efficacy of our approach on datasets concerning political, healthcare, and environmental issues.
arXiv Detail & Related papers (2020-10-16T18:10:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.