SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
- URL: http://arxiv.org/abs/2503.04629v1
- Date: Thu, 06 Mar 2025 17:15:48 GMT
- Title: SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
- Authors: Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Bo Zhang, Lei Bai,
- Abstract summary: We introduce SurveyForge, which generates the outline by analyzing the logical structure of human-written outlines.<n>To achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison.<n>Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.
- Score: 13.101632066188532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.
Related papers
- InteractiveSurvey: An LLM-based Personalized and Interactive Survey Paper Generation System [29.924809109589518]
Large language models (LLMs) and retrieval-augmented generation (RAG) facilitate studies in synthesizing survey papers from multiple references.
In this paper, we introduce InteractiveSurvey - an LLM-based personalized and interactive survey paper generation system.
arXiv Detail & Related papers (2025-03-31T04:23:22Z) - Personalized Generation In Large Model Era: A Survey [90.7579254803302]
In the era of large models, content generation is gradually shifting to Personalized Generation (PGen)<n>This paper presents the first comprehensive survey on PGen, investigating existing research in this rapidly growing field.<n>By bridging PGen research across multiple modalities, this survey serves as a valuable resource for fostering knowledge sharing and interdisciplinary collaboration.
arXiv Detail & Related papers (2025-03-04T13:34:19Z) - SurveyX: Academic Survey Automation via Large Language Models [22.597703631935463]
SurveyX is an efficient and organized system for automated survey generation.<n>It decomposes the survey composing process into two phases: Preparation and Generation.<n>It significantly enhances the efficacy of survey composition.
arXiv Detail & Related papers (2025-02-20T17:59:45Z) - Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks.
We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual.
We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z) - Instruct Large Language Models to Generate Scientific Literature Survey Step by Step [21.149406605689297]
We design prompts to systematically leverage large language models (LLMs)
We argue that this design enables the generation of the headings from a high-level perspective.
Our implementation with Qwen-long achieved third place in the NLPCC 2024 Scientific Literature Survey Generation evaluation task.
arXiv Detail & Related papers (2024-08-15T02:07:11Z) - ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents [21.17856299966841]
This study introduces ResearchArena, a benchmark designed to evaluate large language models (LLMs) in conducting academic surveys.<n>To support these opportunities, we construct an environment of 12M full-text academic papers and 7.9K survey papers.
arXiv Detail & Related papers (2024-06-13T03:26:30Z) - AutoSurvey: Large Language Models Can Automatically Write Surveys [77.0458309675818]
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys.
Traditional survey paper creation faces challenges due to the vast volume and complexity of information.
Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.
arXiv Detail & Related papers (2024-06-10T12:56:06Z) - PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models [72.57329554067195]
ProxyQA is an innovative framework dedicated to assessing longtext generation.
It comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers.
It assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions.
arXiv Detail & Related papers (2024-01-26T18:12:25Z) - An Empirical Survey on Long Document Summarization: Datasets, Models and
Metrics [33.655334920298856]
We provide a comprehensive overview of the research on long document summarization.
We conduct an empirical analysis to broaden the perspective on current research progress.
arXiv Detail & Related papers (2022-07-03T02:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.