CiteME: Can Language Models Accurately Cite Scientific Claims?
- URL: http://arxiv.org/abs/2407.12861v1
- Date: Wed, 10 Jul 2024 11:31:20 GMT
- Title: CiteME: Can Language Models Accurately Cite Scientific Claims?
- Authors: Ori Press, Andreas Hochlehnert, Ameya Prabhu, Vishaal Udandarao, Ofir Press, Matthias Bethge,
- Abstract summary: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper?
Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper.
CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%.
We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers, which achieves an accuracy of 35.3% on CiteME.
- Score: 15.055733335365847
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers, which achieves an accuracy of 35.3\% on CiteME. Overall, CiteME serves as a challenging testbed for open-ended claim attribution, driving the research community towards a future where any claim made by an LM can be automatically verified and discarded if found to be incorrect.
Related papers
- Attribute or Abstain: Large Language Models as Long Document Assistants [58.32043134560244]
We present a benchmark of 6 diverse long document tasks with attribution, and experiment with different approaches to attribution on 4 long documents.
We find that citation, i.e. response generation and evidence extraction in one step, mostly performs best.
We also find that evidence quality can predict response quality on datasets with simple responses, but not so for complex responses.
arXiv Detail & Related papers (2024-07-10T16:16:02Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation [76.31621715032558]
Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses.
We introduce CaLM, a novel verification framework.
Our framework empowers smaller LMs, which rely less on parametric memory, to validate the output of larger LMs.
arXiv Detail & Related papers (2024-06-08T06:04:55Z) - REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs [41.64918533152914]
We investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries.
From around 20K research articles, we make the following deductions on public and proprietary LLMs.
Our study contributes valuable insights into the reliability of RAG for automated citation generation tasks.
arXiv Detail & Related papers (2024-05-03T16:38:51Z) - UFO: a Unified and Flexible Framework for Evaluating Factuality of Large
Language Models [73.73303148524398]
Large language models (LLMs) may generate text that lacks consistency with human knowledge, leading to factual inaccuracies or textithallucination.
We propose textttUFO, an LLM-based unified and flexible evaluation framework to verify facts against plug-and-play fact sources.
arXiv Detail & Related papers (2024-02-22T16:45:32Z) - PaperQA: Retrieval-Augmented Generative Agent for Scientific Research [41.9628176602676]
We present PaperQA, a RAG agent for answering questions over the scientific literature.
PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers.
We also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature.
arXiv Detail & Related papers (2023-12-08T18:50:20Z) - When Large Language Models Meet Citation: A Survey [37.01594297337486]
Large Language Models (LLMs) could be helpful in capturing fine-grained citation information via the corresponding textual context.
Citations also establish connections among scientific papers, providing high-quality inter-document relationships.
We review the application of LLMs for in-text citation analysis tasks, including citation classification, citation-based summarization, and citation recommendation.
arXiv Detail & Related papers (2023-09-18T12:48:48Z) - Enabling Large Language Models to Generate Text with Citations [37.64884969997378]
Large language models (LLMs) have emerged as a widely-used tool for information seeking.
Our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability.
We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.
arXiv Detail & Related papers (2023-05-24T01:53:49Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.