CiteBART: Learning to Generate Citations for Local Citation Recommendation
- URL: http://arxiv.org/abs/2412.17534v1
- Date: Mon, 23 Dec 2024 12:58:30 GMT
- Title: CiteBART: Learning to Generate Citations for Local Citation Recommendation
- Authors: Ege Yiğit Çelik, Selma Tekir,
- Abstract summary: This paper proposes CiteBART, a custom BART pre-training based on citation token masking to generate citations to achieve local citation recommendation (LCR)
In the base scheme, we mask the citation token in the local citation context to make the citation prediction.
In the global one, we reconstruct the paper's title and abstract to the local citation context to learn to the citation token.
The effect is significant in the larger benchmarks, e.g., Refseer and ArXiv.
- Score: 0.138120109831448
- License:
- Abstract: Citations are essential building blocks in scientific writing. The scientific community is longing for support in their generation. Citation generation involves two complementary subtasks: Determining the citation worthiness of a context and, if it's worth it, proposing the best candidate papers for the citation placeholder. The latter subtask is called local citation recommendation (LCR). This paper proposes CiteBART, a custom BART pre-training based on citation token masking to generate citations to achieve LCR. In the base scheme, we mask the citation token in the local citation context to make the citation prediction. In the global one, we concatenate the citing paper's title and abstract to the local citation context to learn to reconstruct the citation token. CiteBART outperforms state-of-the-art approaches on the citation recommendation benchmarks except for the smallest FullTextPeerRead dataset. The effect is significant in the larger benchmarks, e.g., Refseer and ArXiv. We present a qualitative analysis and an ablation study to provide insights into the workings of CiteBART. Our analyses confirm that its generative nature brings about a zero-shot capability.
Related papers
- SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models [51.90867482317985]
SelfCite is a self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for statements in generated responses.
Instead of relying on costly and labor-intensive annotations, SelfCite leverages a reward signal provided by the LLM itself through context ablation.
The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark across five long-form question answering tasks.
arXiv Detail & Related papers (2025-02-13T18:55:13Z) - Citation Recommendation based on Argumentative Zoning of User Queries [7.596930973436683]
argumentative zoning is to identify the argumentative and rhetorical structure in scientific literature.
In this paper, a multi-task learning model is built for citation recommendation and argumentative zoning classification.
arXiv Detail & Related papers (2025-01-30T12:08:00Z) - ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - ILCiteR: Evidence-grounded Interpretable Local Citation Recommendation [31.259805200946175]
We introduce the evidence-grounded local citation recommendation task, where the target latent space comprises evidence spans for recommending specific papers.
Unlike past formulations that simply output recommendations, ILCiteR retrieves ranked lists of evidence span and recommended paper pairs.
We contribute a novel dataset for the evidence-grounded local citation recommendation task and demonstrate the efficacy of our proposed conditional neural rank-ensembling approach for re-ranking evidence spans.
arXiv Detail & Related papers (2024-03-13T17:38:05Z) - Contextualizing Generated Citation Texts [11.531517736126657]
We propose a simple modification to the citation text generation task.
The generation target is not only the citation itself, but the entire context window, including the target citation.
arXiv Detail & Related papers (2024-02-28T05:24:21Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - QuoteR: A Benchmark of Quote Recommendation for Writing [80.83859760380616]
We build a large and fully open quote recommendation dataset called QuoteR.
We conduct an extensive evaluation of existing quote recommendation methods on QuoteR.
We propose a new quote recommendation model that significantly outperforms previous methods on all three parts of QuoteR.
arXiv Detail & Related papers (2022-02-26T14:01:44Z) - Towards generating citation sentences for multiple references with
intent control [86.53829532976303]
We build a novel generation model with the Fusion-in-Decoder approach to cope with multiple long inputs.
Experiments demonstrate that the proposed approaches provide much more comprehensive features for generating citation sentences.
arXiv Detail & Related papers (2021-12-02T15:32:24Z) - Citations are not opinions: a corpus linguistics approach to
understanding how citations are made [0.0]
Key issue in citation content analysis is looking for linguistic structures that characterize distinct classes of citations.
In this study, we start with a large sample of a pre-classified citation corpus, 2 million citations from each class of the scite Smart Citation dataset.
By generating comparison tables for each citation type, we present a number of interesting linguistic features that uniquely characterize citation type.
arXiv Detail & Related papers (2021-04-16T12:52:27Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.