Related papers: Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper

Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper

URL: http://arxiv.org/abs/2508.14273v2
Date: Sat, 23 Aug 2025 13:15:20 GMT
Title: Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper
Authors: Krishna Garg, Firoz Shaik, Sambaran Bandyopadhyay, Cornelia Caragea,
Abstract summary: SciIG is a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works.<n>We assess five state-of-the-art models, including open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems.<n>Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic similarity and faithfulness.
Score: 64.50822834679101
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As researchers increasingly adopt LLMs as writing assistants, generating high-quality research paper introductions remains both challenging and essential. We introduce Scientific Introduction Generation (SciIG), a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works. Curating new datasets from NAACL 2025 and ICLR 2025 papers, we assess five state-of-the-art models, including both open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems, across multiple dimensions: lexical overlap, semantic similarity, content coverage, faithfulness, consistency, citation correctness, and narrative quality. Our comprehensive framework combines automated metrics with LLM-as-a-judge evaluations. Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic similarity and faithfulness. Moreover, three-shot prompting consistently outperforms fewer-shot approaches. These findings provide practical insights into developing effective research writing assistants and set realistic expectations for LLM-assisted academic writing. To foster reproducibility and future research, we will publicly release all code and datasets.

Related papers

Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers.<n>We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers.<n>Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z)
LitLLMs, LLMs for Literature Review: Are we there yet? [15.785989492351684]
This paper explores the zero-shot abilities of recent Large Language Models in assisting with the writing of literature reviews based on an abstract.<n>For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper.<n>In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review.
arXiv Detail & Related papers (2024-12-15T01:12:26Z)
Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content. Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning. Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z)
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks. This study focuses on the topic of LLMs assist NLP Researchers. To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z)
ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents [21.17856299966841]
This study introduces ResearchArena, a benchmark designed to evaluate large language models (LLMs) in conducting academic surveys.<n>To support these opportunities, we construct an environment of 12M full-text academic papers and 7.9K survey papers.
arXiv Detail & Related papers (2024-06-13T03:26:30Z)
Exploring the Latest LLMs for Leaderboard Extraction [0.3072340427031969]
This paper investigates the efficacy of different LLMs-ralMist 7B, Llama GPT-4-Turbo and GPT-4.o in extracting leaderboard information from empirical AI research articles. Our study evaluates the performance of these models in generating (Task, Metric, Score) quadruples from research papers.
arXiv Detail & Related papers (2024-06-06T05:54:45Z)
LLMs as Meta-Reviewers' Assistants: A Case Study [4.345138609587135]
Large Language Models (LLMs) can be used to generate a controlled multi-perspective summary (MPS) of experts opinions.<n>This paper performs a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and PaLM2, to assist meta-reviewers in better comprehending experts perspectives.
arXiv Detail & Related papers (2024-02-23T20:14:16Z)
A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks. This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z)
Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning [5.822010906632045]
This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis.
arXiv Detail & Related papers (2023-07-05T10:15:07Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.