SurveyX: Academic Survey Automation via Large Language Models
- URL: http://arxiv.org/abs/2502.14776v1
- Date: Thu, 20 Feb 2025 17:59:45 GMT
- Title: SurveyX: Academic Survey Automation via Large Language Models
- Authors: Xun Liang, Jiawei Yang, Yezhaohui Wang, Chen Tang, Zifan Zheng, Simin Niu, Shichao Song, Hanyu Wang, Bo Tang, Feiyu Xiong, Keming Mao, Zhiyu li,
- Abstract summary: SurveyX is an efficient and organized system for automated survey generation.
It decomposes the survey composing process into two phases: Preparation and Generation.
It significantly enhances the efficacy of survey composition.
- Score: 23.142476414919653
- License:
- Abstract: Large Language Models (LLMs) have demonstrated exceptional comprehension capabilities and a vast knowledge base, suggesting that LLMs can serve as efficient tools for automated survey generation. However, recent research related to automated survey generation remains constrained by some critical limitations like finite context window, lack of in-depth content discussion, and absence of systematic evaluation frameworks. Inspired by human writing processes, we propose SurveyX, an efficient and organized system for automated survey generation that decomposes the survey composing process into two phases: the Preparation and Generation phases. By innovatively introducing online reference retrieval, a pre-processing method called AttributeTree, and a re-polishing process, SurveyX significantly enhances the efficacy of survey composition. Experimental evaluation results show that SurveyX outperforms existing automated survey generation systems in content quality (0.259 improvement) and citation quality (1.76 enhancement), approaching human expert performance across multiple evaluation dimensions. Examples of surveys generated by SurveyX are available on www.surveyx.cn
Related papers
- Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.
As a testbed, we use country-level results from two global cultural surveys.
We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z) - AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs [53.6200736559742]
AGENT-CQ consists of two stages: a generation stage and an evaluation stage.
CrowdLLM simulates human crowdsourcing judgments to assess generated questions and answers.
Experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality.
arXiv Detail & Related papers (2024-10-25T17:06:27Z) - An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation [29.81362106367831]
Existing evaluation methods often suffer from high costs, limited test formats, the need of human references, and systematic evaluation biases.
In contrast to previous studies that rely on human annotations, Auto-PRE selects evaluators automatically based on their inherent traits.
Experimental results indicate our Auto-PRE achieves state-of-the-art performance at a lower cost.
arXiv Detail & Related papers (2024-10-16T06:06:06Z) - RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework [69.4501863547618]
This paper introduces RAGEval, a framework designed to assess RAG systems across diverse scenarios.
With a focus on factual accuracy, we propose three novel metrics Completeness, Hallucination, and Irrelevance.
Experimental results show that RAGEval outperforms zero-shot and one-shot methods in terms of clarity, safety, conformity, and richness of generated samples.
arXiv Detail & Related papers (2024-08-02T13:35:11Z) - AutoSurvey: Large Language Models Can Automatically Write Surveys [77.0458309675818]
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys.
Traditional survey paper creation faces challenges due to the vast volume and complexity of information.
Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.
arXiv Detail & Related papers (2024-06-10T12:56:06Z) - Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral.
This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z) - Emerging Results on Automated Support for Searching and Selecting
Evidence for Systematic Literature Review Updates [1.1153433121962064]
We present emerging results on an automated approach to support searching and selecting studies for SLR updates in Software Engineering.
We developed an automated tool prototype to perform the snowballing search technique and support selecting relevant studies for SLR updates using Machine Learning (ML) algorithms.
arXiv Detail & Related papers (2024-02-07T23:39:20Z) - PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models [72.57329554067195]
ProxyQA is an innovative framework dedicated to assessing longtext generation.
It comprises in-depth human-curated meta-questions spanning various domains, each accompanied by specific proxy-questions with pre-annotated answers.
It assesses the generated content's quality through the evaluator's accuracy in addressing the proxy-questions.
arXiv Detail & Related papers (2024-01-26T18:12:25Z) - CSMeD: Bridging the Dataset Gap in Automated Citation Screening for
Systematic Literature Reviews [10.207938863784829]
We introduce CSMeD, a meta-dataset consolidating nine publicly released collections.
CSMeD serves as a comprehensive resource for training and evaluating the performance of automated citation screening models.
We introduce CSMeD-FT, a new dataset designed explicitly for evaluating the full text publication screening task.
arXiv Detail & Related papers (2023-11-21T09:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.