Related papers: Literature-Grounded Novelty Assessment of Scientific Ideas

Literature-Grounded Novelty Assessment of Scientific Ideas

URL: http://arxiv.org/abs/2506.22026v1
Date: Fri, 27 Jun 2025 08:47:28 GMT
Title: Literature-Grounded Novelty Assessment of Scientific Ideas
Authors: Simra Shahid, Marissa Radensky, Raymond Fok, Pao Siangliulue, Daniel S. Weld, Tom Hope,
Abstract summary: We propose the Idea Novelty Checker, an LLM-based retrieval-augmented generation framework.<n>Our experiments demonstrate that our novelty checker achieves approximately 13% higher agreement than existing approaches.
Score: 23.481266336046833
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated scientific idea generation systems have made remarkable progress, yet the automatic evaluation of idea novelty remains a critical and underexplored challenge. Manual evaluation of novelty through literature review is labor-intensive, prone to error due to subjectivity, and impractical at scale. To address these issues, we propose the Idea Novelty Checker, an LLM-based retrieval-augmented generation (RAG) framework that leverages a two-stage retrieve-then-rerank approach. The Idea Novelty Checker first collects a broad set of relevant papers using keyword and snippet-based retrieval, then refines this collection through embedding-based filtering followed by facet-based LLM re-ranking. It incorporates expert-labeled examples to guide the system in comparing papers for novelty evaluation and in generating literature-grounded reasoning. Our extensive experiments demonstrate that our novelty checker achieves approximately 13% higher agreement than existing approaches. Ablation studies further showcases the importance of the facet-based re-ranker in identifying the most relevant literature for novelty evaluation.

Related papers

Navigating Through Paper Flood: Advancing LLM-based Paper Evaluation through Domain-Aware Retrieval and Latent Reasoning [30.92327406304362]
We present PaperEval, a novel framework for automated paper evaluation using Large Language Models (LLMs)<n>PaperEval has two key components: 1) a domain-aware paper retrieval module that retrieves relevant concurrent work to support contextualized assessments of novelty and contributions, and 2) a latent reasoning mechanism that enables deep understanding of complex motivations and methodologies.<n> Experiments on two datasets demonstrate that PaperEval consistently outperforms existing methods in both academic impact and paper quality evaluation.
arXiv Detail & Related papers (2025-08-07T08:08:13Z)
From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review [11.761671590108406]
We introduce and explore a novel mechanism that employs LLM agents to perform pairwise comparisons among manuscripts.<n>Our experiments demonstrate that this comparative approach significantly outperforms traditional rating-based methods in identifying high-impact papers.
arXiv Detail & Related papers (2025-06-12T22:27:20Z)
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers.<n>We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers.<n>Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z)
Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition [2.048226951354646]
Large language models (LLMs) have emerged as a potential solution to automate the complex processes involved in writing literature reviews.<n>This study introduces a framework to automatically evaluate the performance of LLMs in three key tasks of literature writing.
arXiv Detail & Related papers (2024-12-18T08:42:25Z)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking [81.88787401178378]
We introduce JudgeRank, a novel agentic reranker that emulates human cognitive processes when assessing document relevance. We evaluate JudgeRank on the reasoning-intensive BRIGHT benchmark, demonstrating substantial performance improvements over first-stage retrieval methods. In addition, JudgeRank performs on par with fine-tuned state-of-the-art rerankers on the popular BEIR benchmark, validating its zero-shot generalization capability.
arXiv Detail & Related papers (2024-10-31T18:43:12Z)
Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents [64.64280477958283]
An exponential increase in scientific literature makes it challenging for researchers to stay current with recent advances and identify meaningful research directions. Recent developments in large language models(LLMs) suggest a promising avenue for automating the generation of novel research ideas. We propose a Chain-of-Ideas(CoI) agent, an LLM-based agent that organizes relevant literature in a chain structure to effectively mirror the progressive development in a research domain.
arXiv Detail & Related papers (2024-10-17T03:26:37Z)
Evaluating and Enhancing Large Language Models for Novelty Assessment in Scholarly Publications [12.183473842592567]
We introduce a scholarly novelty benchmark (SchNovel) to evaluate large language models' ability to assess novelty in scholarly papers. SchNovel consists of 15000 pairs of papers across six fields sampled from the arXiv dataset with publication dates spanning 2 to 10 years apart. RAG-Novelty simulates the review process taken by human reviewers by leveraging the retrieval of similar papers to assess novelty.
arXiv Detail & Related papers (2024-09-25T04:12:38Z)
Good Idea or Not, Representation of LLM Could Tell [86.36317971482755]
We focus on idea assessment, which aims to leverage the knowledge of large language models to assess the merit of scientific ideas. We release a benchmark dataset from nearly four thousand manuscript papers with full texts, meticulously designed to train and evaluate the performance of different approaches to this task. Our findings suggest that the representations of large language models hold more potential in quantifying the value of ideas than their generative outputs.
arXiv Detail & Related papers (2024-09-07T02:07:22Z)
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [66.93260816493553]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.<n>With a focus on factual accuracy, we propose three novel metrics: Completeness, Hallucination, and Irrelevance.<n> Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z)
$T^5Score$: A Methodology for Automatically Assessing the Quality of LLM Generated Multi-Document Topic Sets [16.516381474175986]
We introduce $T5Score, an evaluation methodology that decomposes the quality of a topic into quantifiable aspects.<n>This framing enables a convenient, manual or automatic, evaluation procedure resulting in a strong inter-annotator agreement score.
arXiv Detail & Related papers (2024-07-24T16:14:15Z)
RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem. We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt. We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.