Related papers: Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge

Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge

URL: http://arxiv.org/abs/2507.11330v2
Date: Wed, 16 Jul 2025 14:26:34 GMT
Title: Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge
Authors: Wenqing Wu, Chengzhi Zhang, Yi Zhao,
Abstract summary: One of the most common types of novelty in academic papers is the introduction of new methods.<n>In this paper, we propose leveraging human knowledge and LLM to assist pretrained language models (PLMs) in predicting the method novelty of papers.
Score: 9.208744138848765
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Novelty is a crucial criterion in the peer review process for evaluating academic papers. Traditionally, it's judged by experts or measure by unique reference combinations. Both methods have limitations: experts have limited knowledge, and the effectiveness of the combination method is uncertain. Moreover, it's unclear if unique citations truly measure novelty. The large language model (LLM) possesses a wealth of knowledge, while human experts possess judgment abilities that the LLM does not possess. Therefore, our research integrates the knowledge and abilities of LLM and human experts to address the limitations of novelty assessment. One of the most common types of novelty in academic papers is the introduction of new methods. In this paper, we propose leveraging human knowledge and LLM to assist pretrained language models (PLMs, e.g. BERT etc.) in predicting the method novelty of papers. Specifically, we extract sentences related to the novelty of the academic paper from peer review reports and use LLM to summarize the methodology section of the academic paper, which are then used to fine-tune PLMs. In addition, we have designed a text-guided fusion module with novel Sparse-Attention to better integrate human and LLM knowledge. We compared the method we proposed with a large number of baselines. Extensive experiments demonstrate that our method achieves superior performance.

Related papers

Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey [64.08485471150486]
This survey examines evaluation methods for large language model (LLM)-based agents in multi-turn conversational settings.<n>We systematically reviewed nearly 250 scholarly sources, capturing the state of the art from various venues of publication.
arXiv Detail & Related papers (2025-03-28T14:08:40Z)
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model [54.14155564592936]
We propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM)<n>MoRE-LLM steers the discovery of local rule-based surrogates during training and their utilization for the classification task.<n>LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them.
arXiv Detail & Related papers (2025-03-26T11:09:21Z)
Large Language Models Penetration in Scholarly Writing and Peer Review [43.600778691549706]
We evaluate the penetration of Large Language Models across academic perspectives and dimensions.<n>Our experiments demonstrate the effectiveness of textttLLMetrica, revealing the increasing role of LLMs in scholarly processes.<n>These findings emphasize the need for transparency, accountability, and ethical practices in LLM usage to maintain academic credibility.
arXiv Detail & Related papers (2025-02-16T16:37:34Z)
The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs [21.97227334180969]
"LLM-as-an-annotator" and "LLM-as-a-judge" paradigms employ Large Language Models (LLMs) as annotators, judges, and evaluators in tasks traditionally performed by humans.<n>Despite their role in shaping study results and insights, there is no standard or rigorous procedure to determine whether LLMs can replace human annotators.<n>We propose a novel statistical procedure, the Alternative Annotator Test (alt-test), that requires only a modest subset of annotated examples to justify using LLM annotations.
arXiv Detail & Related papers (2025-01-19T07:09:11Z)
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.<n>The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.<n>We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z)
How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments [22.76776244036282]
The grammatical knowledge of language models (LMs) is often measured using a benchmark of linguistic minimal pairs.<n>Recent large language models (LLMs) are trained to perform tasks via prompting, and thus, the raw probabilities they assign may not fully reflect their grammatical knowledge.<n>This study attempts to derive more accurate judgments from LLMs using prompts and templates.
arXiv Detail & Related papers (2024-08-19T01:53:47Z)
Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form Text [12.879551933541345]
Large Language Models (LLMs) are capable of generating human-like conversations. Conventional metrics like BLEU and ROUGE are inadequate for capturing the subtle semantics and contextual richness of such generative outputs. We propose a reference-guided verdict method that automates the evaluation process by leveraging multiple LLMs-as-judges.
arXiv Detail & Related papers (2024-08-17T16:01:45Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [77.72128397088409]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.<n>We also propose a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
Exploring the landscape of large language models: Foundations, techniques, and challenges [8.042562891309414]
The article sheds light on the mechanics of in-context learning and a spectrum of fine-tuning approaches. It explores how LLMs can be more closely aligned with human preferences through innovative reinforcement learning frameworks. The ethical dimensions of LLM deployment are discussed, underscoring the need for mindful and responsible application.
arXiv Detail & Related papers (2024-04-18T08:01:20Z)
Comuniqa : Exploring Large Language Models for improving speaking skills [2.8227892155844088]
We investigate the potential of Large Language Models (LLMs) to improve English speaking skills. Recent advancements in Artificial Intelligence (AI) offer promising solutions to overcome limitations. We propose Comuniqa, a novel LLM-based system designed to enhance English speaking skills.
arXiv Detail & Related papers (2024-01-28T07:37:33Z)
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task [53.163534619649866]
This paper focuses on assessing the effectiveness of prompt-based techniques to empower Large Language Models to handle the task of quality estimation. We conducted systematic experiments with various prompting techniques, including standard prompting, prompts informed by annotator instructions, and innovative chain-of-thought prompting. Our work reveals that combining these approaches using a "small", open source model (orca_mini_v3_7B) yields competitive results.
arXiv Detail & Related papers (2023-11-01T17:44:35Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Aligning Large Language Models with Human: A Survey [53.6014921995006]
Large Language Models (LLMs) trained on extensive textual corpora have emerged as leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite their notable performance, these models are prone to certain limitations such as misunderstanding human instructions, generating potentially biased content, or factually incorrect information. This survey presents a comprehensive overview of these alignment technologies, including the following aspects.
arXiv Detail & Related papers (2023-07-24T17:44:58Z)
On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets. We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.