NoveltyRank: Estimating Conceptual Novelty of AI Papers
- URL: http://arxiv.org/abs/2512.14738v1
- Date: Fri, 12 Dec 2025 03:33:32 GMT
- Title: NoveltyRank: Estimating Conceptual Novelty of AI Papers
- Authors: Zhengxu Yan, Han Li, Yuming Feng,
- Abstract summary: This project aims to develop a model that estimates and ranks conceptual novelty of AI papers.<n>Our approach evaluates novelty primarily through a paper's title, abstract, and semantic similarity to prior literature.<n>We fine-tune Qwen3-4B-Instruct-2507 and SciBERT on both tasks, benchmarking against GPT-5.1 to analyze how task formulation and modeling choices affect performance.
- Score: 8.218640708170119
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: With the growing ease of academic publishing, the volume of research papers, especially in AI-related fields, has surged dramatically. This flood of publications makes it difficult for truly novel and impactful work to stand out, and manual novelty assessment is often unstable and time-consuming. Our project aims to develop a model that estimates and ranks the conceptual novelty of AI papers, enabling a data-driven and scalable assessment of research originality. Such a system can help researchers efficiently identify submissions that introduce genuinely innovative ideas rather than minor variants, and provide conference reviewers with a quantitative and consistent signal of novelty. Our approach evaluates novelty primarily through a paper's title, abstract, and semantic similarity to prior literature. Given the motivation of novelty estimation, we explore two task formulations with different modeling objectives, each offering a different perspective: (1) binary classification, which predicts the paper's absolute novelty from learned patterns of prior novel works, and (2) pairwise novelty comparison, which learns to distinguish papers by relative novelty over others. We fine-tune Qwen3-4B-Instruct-2507 and SciBERT on both tasks, benchmarking against GPT-5.1 to analyze how task formulation and modeling choices affect performance. The implementation is publicly available at https://github.com/ZhengxuYan/NoveltyRank.
Related papers
- What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation [4.14197005718384]
We introduce a literature-aware novelty assessment framework that learns how humans judge novelty from peer-review reports.<n>Using nearly 80K novelty-annotated reviews from top-tier AI conferences, we fine-tune a large language model to capture reviewer-aligned novelty evaluation behavior.
arXiv Detail & Related papers (2026-01-14T16:49:39Z) - Literature-Grounded Novelty Assessment of Scientific Ideas [23.481266336046833]
We propose the Idea Novelty Checker, an LLM-based retrieval-augmented generation framework.<n>Our experiments demonstrate that our novelty checker achieves approximately 13% higher agreement than existing approaches.
arXiv Detail & Related papers (2025-06-27T08:47:28Z) - RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem.
We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt.
We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z) - Chain-of-Factors Paper-Reviewer Matching [32.86512592730291]
We propose a unified model for paper-reviewer matching that jointly considers semantic, topic, and citation factors.<n>We demonstrate the effectiveness of our proposed Chain-of-Factors model in comparison with state-of-the-art paper-reviewer matching methods and scientific pre-trained language models.
arXiv Detail & Related papers (2023-10-23T01:29:18Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - SciMON: Scientific Inspiration Machines Optimized for Novelty [68.46036589035539]
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature.
We take a dramatic departure with a novel setting in which models use as input background contexts.
We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers.
arXiv Detail & Related papers (2023-05-23T17:12:08Z) - A Unified Evaluation Framework for Novelty Detection and Accommodation
in NLP with an Instantiation in Authorship Attribution [25.52598351435189]
We introduce 'NoveltyTask', a multi-stage task to evaluate a system's performance on pipelined novelty 'detection' and 'accommodation' tasks.
We use Amazon reviews corpus and compile a large dataset (consisting of 250k instances across 200 authors/labels) for NoveltyTask.
arXiv Detail & Related papers (2023-05-08T22:37:30Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.