Cited Text Spans for Citation Text Generation
- URL: http://arxiv.org/abs/2309.06365v2
- Date: Tue, 20 Feb 2024 23:31:22 GMT
- Title: Cited Text Spans for Citation Text Generation
- Authors: Xiangci Li, Yi-Hui Lee, Jessica Ouyang
- Abstract summary: An automatic citation generation system aims to concisely and accurately describe the relationship between two scientific articles.
Due to the length of scientific documents, existing abstractive approaches have conditioned only on cited paper abstracts.
We propose to condition instead on the cited text span (CTS) as an alternative to the abstract.
- Score: 12.039469573641217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: An automatic citation generation system aims to concisely and accurately
describe the relationship between two scientific articles. To do so, such a
system must ground its outputs to the content of the cited paper to avoid
non-factual hallucinations. Due to the length of scientific documents, existing
abstractive approaches have conditioned only on cited paper abstracts. We
demonstrate empirically that the abstract is not always the most appropriate
input for citation generation and that models trained in this way learn to
hallucinate. We propose to condition instead on the cited text span (CTS) as an
alternative to the abstract. Because manual CTS annotation is extremely time-
and labor-intensive, we experiment with distant labeling of candidate CTS
sentences, achieving sufficiently strong performance to substitute for
expensive human annotations in model training, and we propose a
human-in-the-loop, keyword-based CTS retrieval approach that makes generating
citation texts grounded in the full text of cited papers both promising and
practical.
Related papers
- HLM-Cite: Hybrid Language Model Workflow for Text-based Scientific Citation Prediction [14.731720495144112]
We introduce the novel concept of core citation, which identifies the critical references that go beyond superficial mentions.
We propose $textbfHLM-Cite, a $textbfH$ybrid $textbfL$anguage $textbfM$odel workflow for citation prediction.
We evaluate HLM-Cite across 19 scientific fields, demonstrating a 17.6% performance improvement comparing SOTA methods.
arXiv Detail & Related papers (2024-10-10T10:46:06Z) - Context-Enhanced Language Models for Generating Multi-Paper Citations [35.80247519023821]
We propose a method that leverages Large Language Models (LLMs) to generate multi-citation sentences.
Our approach involves a single source paper and a collection of target papers, culminating in a coherent paragraph containing multi-sentence citation text.
arXiv Detail & Related papers (2024-04-22T04:30:36Z) - Contextualizing Generated Citation Texts [11.531517736126657]
We propose a simple modification to the citation text generation task.
The generation target is not only the citation itself, but the entire context window, including the target citation.
arXiv Detail & Related papers (2024-02-28T05:24:21Z) - SciLit: A Platform for Joint Scientific Literature Discovery,
Summarization and Citation Generation [11.186252009101077]
We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper.
SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system.
arXiv Detail & Related papers (2023-06-06T09:34:45Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Tortured phrases: A dubious writing style emerging in science. Evidence
of critical issues affecting established journals [69.76097138157816]
Probabilistic text generators have been used to produce fake scientific papers for more than a decade.
Complex AI-powered generation techniques produce texts indistinguishable from that of humans.
Some websites offer to rewrite texts for free, generating gobbledegook full of tortured phrases.
arXiv Detail & Related papers (2021-07-12T20:47:08Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.