What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation
- URL: http://arxiv.org/abs/2602.06054v2
- Date: Wed, 11 Feb 2026 12:35:37 GMT
- Title: What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation
- Authors: Abeer Mostafa, Thi Huyen Nguyen, Zahra Ahmadi,
- Abstract summary: We introduce a literature-aware novelty assessment framework that learns how humans judge novelty from peer-review reports.<n>Using nearly 80K novelty-annotated reviews from top-tier AI conferences, we fine-tune a large language model to capture reviewer-aligned novelty evaluation behavior.
- Score: 4.14197005718384
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Assessing research novelty is a core yet highly subjective aspect of peer review, typically based on implicit judgment and incomplete comparison to prior work. We introduce a literature-aware novelty assessment framework that explicitly learns how humans judge novelty from peer-review reports and grounds these judgments in structured comparison to existing research. Using nearly 80K novelty-annotated reviews from top-tier AI conferences, we fine-tune a large language model to capture reviewer-aligned novelty evaluation behavior. For a given manuscript, the system extracts structured representations of its ideas, methods, and claims, retrieves semantically related papers, and constructs a similarity graph that enables fine-grained, concept-level comparison to prior work. Conditioning on this structured evidence, the model produces calibrated novelty scores and human-like explanatory assessments, reducing overestimation and improving consistency relative to existing approaches.
Related papers
- ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review [48.60540055009675]
ScholarPeer is a search-enabled multi-agent framework designed to emulate the cognitive processes of a senior researcher.<n>We evaluate ScholarPeer on DeepReview-13K and the results demonstrate that ScholarPeer achieves significant win-rates against state-of-the-art approaches in side-by-side evaluations.
arXiv Detail & Related papers (2026-01-30T06:54:55Z) - NoveltyRank: Estimating Conceptual Novelty of AI Papers [8.218640708170119]
This project aims to develop a model that estimates and ranks conceptual novelty of AI papers.<n>Our approach evaluates novelty primarily through a paper's title, abstract, and semantic similarity to prior literature.<n>We fine-tune Qwen3-4B-Instruct-2507 and SciBERT on both tasks, benchmarking against GPT-5.1 to analyze how task formulation and modeling choices affect performance.
arXiv Detail & Related papers (2025-12-12T03:33:32Z) - Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback [81.0031690510116]
We present a structured approach for automated novelty evaluation that models expert reviewer behavior through three stages.<n>Our method is informed by a large scale analysis of human written novelty reviews.<n> Evaluated on 182 ICLR 2025 submissions, the approach achieves 86.5% alignment with human reasoning and 75.3% agreement on novelty conclusions.
arXiv Detail & Related papers (2025-08-14T16:18:37Z) - Literature-Grounded Novelty Assessment of Scientific Ideas [23.481266336046833]
We propose the Idea Novelty Checker, an LLM-based retrieval-augmented generation framework.<n>Our experiments demonstrate that our novelty checker achieves approximately 13% higher agreement than existing approaches.
arXiv Detail & Related papers (2025-06-27T08:47:28Z) - GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews [25.291384842659397]
We introduce sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews.
Unlike traditional consensus-based methods, sys extracts both common and unique opinions from the reviews.
arXiv Detail & Related papers (2024-06-11T15:27:01Z) - SciMON: Scientific Inspiration Machines Optimized for Novelty [68.46036589035539]
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.
We take a dramatic departure with a novel setting in which models use as input background contexts.
We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers.
arXiv Detail & Related papers (2023-05-23T17:12:08Z) - SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z) - Just Rank: Rethinking Evaluation with Word and Sentence Similarities [105.5541653811528]
intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade.
This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations.
We propose a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks.
arXiv Detail & Related papers (2022-03-05T08:40:05Z) - SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption
Evaluation via Typicality Analysis [20.026835809227283]
We introduce "typicality", a new formulation of evaluation rooted in information theory.
We show how these decomposed dimensions of semantics and fluency provide greater system-level insight into captioner differences.
Our proposed metrics along with their combination, SMURF, achieve state-of-the-art correlation with human judgment when compared with other rule-based evaluation metrics.
arXiv Detail & Related papers (2021-06-02T19:58:20Z) - Unsupervised Reference-Free Summary Quality Evaluation via Contrastive
Learning [66.30909748400023]
We propose to evaluate the summary qualities without reference summaries by unsupervised contrastive learning.
Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT.
Experiments on Newsroom and CNN/Daily Mail demonstrate that our new evaluation method outperforms other metrics even without reference summaries.
arXiv Detail & Related papers (2020-10-05T05:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.