Related papers: ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation

ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation

URL: http://arxiv.org/abs/2511.17689v1
Date: Fri, 21 Nov 2025 14:14:35 GMT
Title: ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation
Authors: Zi Wang, Xingqiao Wang, Sangah Lee, Xiaowei Xu,
Abstract summary: ARISE is an agentic-guided Iterative Survey Engine for automated generation and continuous refinement of academic survey papers.<n>ARISE employs a modular architecture composed of specialized large language model agents, each mirroring distinct scholarly roles such as topic expansion, citation curation, literature summarization, manuscript drafting, and peer-review-based evaluation.<n>ARISE consistently surpasses baseline methods across metrics of comprehensiveness, accuracy, formatting, and overall scholarly rigor.
Score: 7.437989615069771
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid expansion of scholarly literature presents significant challenges in synthesizing comprehensive, high-quality academic surveys. Recent advancements in agentic systems offer considerable promise for automating tasks that traditionally require human expertise, including literature review, synthesis, and iterative refinement. However, existing automated survey-generation solutions often suffer from inadequate quality control, poor formatting, and limited adaptability to iterative feedback, which are core elements intrinsic to scholarly writing. To address these limitations, we introduce ARISE, an Agentic Rubric-guided Iterative Survey Engine designed for automated generation and continuous refinement of academic survey papers. ARISE employs a modular architecture composed of specialized large language model agents, each mirroring distinct scholarly roles such as topic expansion, citation curation, literature summarization, manuscript drafting, and peer-review-based evaluation. Central to ARISE is a rubric-guided iterative refinement loop in which multiple reviewer agents independently assess manuscript drafts using a structured, behaviorally anchored rubric, systematically enhancing the content through synthesized feedback. Evaluating ARISE against state-of-the-art automated systems and recent human-written surveys, our experimental results demonstrate superior performance, achieving an average rubric-aligned quality score of 92.48. ARISE consistently surpasses baseline methods across metrics of comprehensiveness, accuracy, formatting, and overall scholarly rigor. All code, evaluation rubrics, and generated outputs are provided openly at https://github.com/ziwang11112/ARISE

Related papers

ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review [48.60540055009675]
ScholarPeer is a search-enabled multi-agent framework designed to emulate the cognitive processes of a senior researcher.<n>We evaluate ScholarPeer on DeepReview-13K and the results demonstrate that ScholarPeer achieves significant win-rates against state-of-the-art approaches in side-by-side evaluations.
arXiv Detail & Related papers (2026-01-30T06:54:55Z)
Author-in-the-Loop Response Generation and Evaluation: Integrating Author Expertise and Intent in Responses to Peer Review [53.99984738447279]
Recent work frames this task as automatic text generation, underusing author expertise and intent.<n>We introduce REspGen, a generation framework that integrates explicit author input, multi-attribute control, and evaluation-guided refinement.<n>To support this formulation, we construct Re$3$Align, the first large-scale dataset of aligned review-response--revision triplets.
arXiv Detail & Related papers (2026-01-19T14:07:10Z)
DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing [53.85037373860246]
We introduce Deep Synth-Eval, a benchmark designed to objectively evaluate information consolidation capabilities.<n>We propose a fine-grained evaluation protocol using General Checklists (for factual coverage) and Constraint Checklists (for structural organization)<n>Our results demonstrate that agentic plan-and-write significantly outperform single-turn generation.
arXiv Detail & Related papers (2026-01-07T03:07:52Z)
AutoSurvey2: Empowering Researchers with Next Level Automated Literature Surveys [10.50820843303237]
This paper introduces autosurvey2, a multi-stage pipeline that automates survey generation through retrieval-augmented synthesis and structured evaluation.<n>The system integrates parallel section generation, iterative refinement, and real-time retrieval of recent publications to ensure both topical completeness and factual accuracy.<n> Experimental results demonstrate that autosurvey2 consistently outperforms existing retrieval-based and automated baselines.
arXiv Detail & Related papers (2025-10-29T22:57:03Z)
LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation [66.09346158850308]
We present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process.<n>LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles.<n>We evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation.
arXiv Detail & Related papers (2025-10-01T12:14:28Z)
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization [86.98098988779809]
We propose SummQ, a novel adversarial multi-agent framework for long document summarization.<n>Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries.<n>We evaluate SummQ on three widely used long document summarization benchmarks.
arXiv Detail & Related papers (2025-09-25T08:36:19Z)
SurGE: A Benchmark and Evaluation Framework for Scientific Survey Generation [37.921524136479825]
SurGE (Survey Generation Evaluation) is a new benchmark for scientific survey generation in computer science.<n>SurGE consists of (1) a collection of test instances, each including a topic description, an expert-written survey, and its full set of cited references, and (2) a large-scale academic corpus of over one million papers.<n>In addition, we propose an automated evaluation framework that measures the quality of generated surveys across four dimensions.
arXiv Detail & Related papers (2025-08-21T15:45:10Z)
AutoRev: Multi-Modal Graph Retrieval for Automated Peer-Review Generation [5.72767946092813]
AutoRev is an automatic peer-review system designed to provide actionable, high-quality feedback to both reviewers and authors.<n>By modelling documents as graphs, AutoRev effectively retrieves the most pertinent information.<n>We envision AutoRev as a powerful tool to streamline the peer-review workflow, alleviating challenges and enabling scalable, high-quality scholarly publishing.
arXiv Detail & Related papers (2025-05-20T13:59:58Z)
AI-generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity [13.371946973050845]
We examine and benchmark the characteristics and quality of essays generated by popular large language models (LLMs)<n>Our findings highlight limitations in existing automated scoring systems, and identify areas for improvement.<n>Despite concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy.
arXiv Detail & Related papers (2024-10-22T21:30:58Z)
RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance [0.8089605035945486]
We propose RelevAI-Reviewer, an automatic system that conceptualizes the task of survey paper review as a classification problem. We introduce a novel dataset comprised of 25,164 instances. Each instance contains one prompt and four candidate papers, each varying in relevance to the prompt. We develop a machine learning (ML) model capable of determining the relevance of each paper and identifying the most pertinent one.
arXiv Detail & Related papers (2024-06-13T06:42:32Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
System for systematic literature review using multiple AI agents: Concept and an empirical evaluation [3.453564255183234]
Systematic literature review ( SLR) is foundational to evidence-based research.<n>We introduce a novel multi-AI-agent system designed to fully automate SLRs.<n>We show how it substantially reduces the time and effort traditionally required for SLRs while maintaining comprehensiveness and precision.
arXiv Detail & Related papers (2024-03-13T10:27:52Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [51.26815896167173]
We present a comprehensive tertiary analysis of PAMI reviews along three complementary dimensions.<n>Our analyses reveal distinctive organizational patterns as well as persistent gaps in current review practices.<n>Finally, our evaluation of state-of-the-art AI-generated reviews indicates encouraging advances in coherence and organization.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models [72.57329554067195]
ProxyQA is an innovative framework dedicated to assessing longtext generation. It comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers. It assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions.
arXiv Detail & Related papers (2024-01-26T18:12:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.