CiteBench: A benchmark for Scientific Citation Text Generation
- URL: http://arxiv.org/abs/2212.09577v3
- Date: Fri, 3 Nov 2023 19:55:56 GMT
- Title: CiteBench: A benchmark for Scientific Citation Text Generation
- Authors: Martin Funkquist, Ilia Kuznetsov, Yufang Hou and Iryna Gurevych
- Abstract summary: CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
- Score: 69.37571393032026
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Science progresses by building upon the prior body of knowledge documented in
scientific publications. The acceleration of research makes it hard to stay
up-to-date with the recent developments and to summarize the ever-growing body
of prior work. To address this, the task of citation text generation aims to
produce accurate textual summaries given a set of papers-to-cite and the citing
paper context. Due to otherwise rare explicit anchoring of cited documents in
the citing paper, citation text generation provides an excellent opportunity to
study how humans aggregate and synthesize textual knowledge from sources. Yet,
existing studies are based upon widely diverging task definitions, which makes
it hard to study this task systematically. To address this challenge, we
propose CiteBench: a benchmark for citation text generation that unifies
multiple diverse datasets and enables standardized evaluation of citation text
generation models across task designs and domains. Using the new benchmark, we
investigate the performance of multiple strong baselines, test their
transferability between the datasets, and deliver new insights into the task
definition and evaluation to guide future research in citation text generation.
We make the code for CiteBench publicly available at
https://github.com/UKPLab/citebench.
Related papers
- Verifiable Generation with Subsentence-Level Fine-Grained Citations [13.931548733211436]
Verifiable generation requires large language models to cite source documents supporting their outputs.
Previous work mainly targets the generation of sentence-level citations, lacking specificity about which parts of a sentence are backed by the cited sources.
This work studies verifiable generation with subsentence-level fine-grained citations for more precise location of generated content supported by the cited sources.
arXiv Detail & Related papers (2024-06-10T09:32:37Z) - Contextualizing Generated Citation Texts [11.531517736126657]
We propose a simple modification to the citation text generation task.
The generation target is not only the citation itself, but the entire context window, including the target citation.
arXiv Detail & Related papers (2024-02-28T05:24:21Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - A survey on text generation using generative adversarial networks [0.0]
This work presents a thorough review concerning recent studies and text generation advancements using Generative Adversarial Networks.
The usage of adversarial learning for text generation is promising as it provides alternatives to generate the so-called "natural" language.
arXiv Detail & Related papers (2022-12-20T17:54:08Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation [27.064042116555925]
We propose a new task, namely textbfcontext-aware text generation in the scientific domain.
We present a novel large-scale textbfScientific Paper dataset for ContetextbfXt-Aware Text textbfGeneration (SciXGen)
We comprehensively benchmark, using state-of-the-arts, the efficacy of our newly constructed SciXGen dataset in generating description and paragraph.
arXiv Detail & Related papers (2021-10-20T20:37:11Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - From Standard Summarization to New Tasks and Beyond: Summarization with
Manifold Information [77.89755281215079]
Text summarization is the research area aiming at creating a short and condensed version of the original document.
In real-world applications, most of the data is not in a plain text format.
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application.
arXiv Detail & Related papers (2020-05-10T14:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.