Related papers: XtraGPT: Context-Aware and Controllable Academic Paper Revision

XtraGPT: Context-Aware and Controllable Academic Paper Revision

URL: http://arxiv.org/abs/2505.11336v3
Date: Thu, 23 Oct 2025 14:49:19 GMT
Title: XtraGPT: Context-Aware and Controllable Academic Paper Revision
Authors: Nuo Chen, Andre Lin HuiKai, Jiaying Wu, Junyi Hou, Zining Zhang, Qian Wang, Xidong Wang, Bingsheng He,
Abstract summary: We propose a human-AI collaboration framework for academic paper revision centered on criteria-guided intent alignment and context-aware modeling.<n>We instantiate the framework in XtraGPT, the first suite of open-source LLMs for context-aware, instruction-guided writing assistance.
Score: 43.263488839387584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the growing adoption of large language models (LLMs) in academic workflows, their capabilities remain limited to support high-quality scientific writing. Most existing systems are designed for general-purpose scientific text generation and fail to meet the sophisticated demands of research communication beyond surface-level polishing, such as conceptual coherence across sections. Furthermore, academic writing is inherently iterative and revision-driven, a process not well supported by direct prompting-based paradigms. To address these scenarios, we propose a human-AI collaboration framework for academic paper revision centered on criteria-guided intent alignment and context-aware modeling. To validate the framework, we curate a dataset of 7,000 research papers from top-tier venues annotated with 140,000 instruction-response pairs that reflect realistic, section-level scientific revisions. We instantiate the framework in XtraGPT, the first suite of open-source LLMs (1.5B to 14B parameters) for context-aware, instruction-guided writing assistance. Extensive experiments validate that XtraGPT significantly outperforms same-scale baselines and approaches the quality of proprietary systems. Both automated preference assessments and human evaluations confirm the effectiveness of XtraGPT in improving scientific drafts.

Related papers

Evaluating AI Grading on Real-World Handwritten College Mathematics: A Large-Scale Study Toward a Benchmark [9.922581736690159]
We present a large-scale empirical study of AI grading on real, handwritten calculus work from UC Irvine.<n>Using OCR-conditioned large language models with structured, rubric-guided prompting, our system produces scores and formative feedback for thousands of free-response quiz submissions.<n>In a setting with no single ground-truth label, we evaluate performance against official teaching-assistant grades, student surveys, and independent human review.
arXiv Detail & Related papers (2026-03-01T03:32:51Z)
Reward Modeling for Scientific Writing Evaluation [50.33952894976367]
It is critical to develop models that can be reliably deployed for evaluating diverse open-ended scientific writing tasks.<n>We propose cost-efficient, open-source reward models tailored for scientific writing evaluation.
arXiv Detail & Related papers (2026-01-16T15:32:58Z)
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension [65.81339691942757]
RPC-Bench is a large-scale question-answering benchmark built from review-rebuttal exchanges of high-quality computer science papers.<n>We design a fine-grained taxonomy aligned with the scientific research flow to assess models' ability to understand and answer why, what, and how questions in scholarly contexts.
arXiv Detail & Related papers (2026-01-14T11:37:00Z)
SciFig: Towards Automating Scientific Figure Generation [41.73701976318102]
SciFig is an end-to-end AI agent system that generates publication-ready pipeline figures directly from research paper texts.<n>We introduce a rubric-based evaluation framework that analyzes 2,219 real scientific figures to extract evaluation rubrics.<n>SciFig demonstrates remarkable performance: achieving 70.1$%$ overall quality on dataset-level evaluation and 66.2$%$ on paper-specific evaluation.
arXiv Detail & Related papers (2026-01-07T20:56:58Z)
Expert Preference-based Evaluation of Automated Related Work Generation [54.29459509574242]
We propose GREP, a multi-turn evaluation framework that integrates classical related work evaluation criteria with expert-specific preferences.<n>For better accessibility, we design two variants of GREP: a more precise variant with proprietary LLMs as evaluators, and a cheaper alternative with open-weight LLMs.
arXiv Detail & Related papers (2025-08-11T13:08:07Z)
Navigating Through Paper Flood: Advancing LLM-based Paper Evaluation through Domain-Aware Retrieval and Latent Reasoning [30.92327406304362]
We present PaperEval, a novel framework for automated paper evaluation using Large Language Models (LLMs)<n>PaperEval has two key components: 1) a domain-aware paper retrieval module that retrieves relevant concurrent work to support contextualized assessments of novelty and contributions, and 2) a latent reasoning mechanism that enables deep understanding of complex motivations and methodologies.<n> Experiments on two datasets demonstrate that PaperEval consistently outperforms existing methods in both academic impact and paper quality evaluation.
arXiv Detail & Related papers (2025-08-07T08:08:13Z)
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks [87.29946641069068]
We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature tasks.<n>By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks.<n>We release SciArena-Eval, a meta-evaluation benchmark based on our collected preference data.
arXiv Detail & Related papers (2025-07-01T17:51:59Z)
AutoRev: Automatic Peer Review System for Academic Research Papers [9.269282930029856]
AutoRev is an Automatic Peer Review System for Academic Research Papers.<n>Our framework represents an academic document as a graph, enabling the extraction of the most critical passages.<n>When applied to review generation, our method outperforms SOTA baselines by an average of 58.72%.
arXiv Detail & Related papers (2025-05-20T13:59:58Z)
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations [45.57178343138677]
We introduce ScholarCopilot, a unified framework designed to enhance existing large language models for academic writing.<n> ScholarCopilot determines when to retrieve scholarly references by generating a retrieval token [RET], which is then used to query a citation database.<n>We jointly optimize both the generation and citation tasks within a single framework to improve efficiency.
arXiv Detail & Related papers (2025-04-01T14:12:14Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem. We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt. We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z)
ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents [21.17856299966841]
This study introduces ResearchArena, a benchmark designed to evaluate large language models (LLMs) in conducting academic surveys.<n>To support these opportunities, we construct an environment of 12M full-text academic papers and 7.9K survey papers.
arXiv Detail & Related papers (2024-06-13T03:26:30Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science. Existing NLP studies focus on the analysis of individual texts. editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.