Related papers: From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews

From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews

URL: http://arxiv.org/abs/2512.11661v1
Date: Fri, 12 Dec 2025 15:38:34 GMT
Title: From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews
Authors: Brenda Nogueira, Werner Geyer, Andrew Anderson, Toby Jia-Jun Li, Dongwhi Kim, Nuno Moniz, Nitesh V. Chawla,
Abstract summary: We report a user study with researchers across multiple disciplines to characterize current practices, benefits, and textitpain points in using LLMs to investigate related work.<n>We identified three recurring gaps: (i) lack of trust in outputs, (ii) persistent verification burden, and (iii) requiring multiple tools.<n>This motivates our proposal of six design goals and a high-level framework that operationalizes them through improved related papers visualization, verification at every step, and human-feedback alignment with generation-guided explanations.
Score: 37.98620195038937
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large Language Models (LLMs) are increasingly embedded in academic writing practices. Although numerous studies have explored how researchers employ these tools for scientific writing, their concrete implementation, limitations, and design challenges within the literature review process remain underexplored. In this paper, we report a user study with researchers across multiple disciplines to characterize current practices, benefits, and \textit{pain points} in using LLMs to investigate related work. We identified three recurring gaps: (i) lack of trust in outputs, (ii) persistent verification burden, and (iii) requiring multiple tools. This motivates our proposal of six design goals and a high-level framework that operationalizes them through improved related papers visualization, verification at every step, and human-feedback alignment with generation-guided explanations. Overall, by grounding our work in the practical, day-to-day needs of researchers, we designed a framework that addresses these limitations and models real-world LLM-assisted writing, advancing trust through verifiable actions and fostering practical collaboration between researchers and AI systems.

Related papers

Deep Research: A Systematic Survey [118.82795024422722]
Deep Research (DR) aims to combine the reasoning capabilities of large language models with external tools, such as search engines.<n>This survey presents a comprehensive and systematic overview of deep research systems.
arXiv Detail & Related papers (2025-11-24T15:28:28Z)
Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization [86.98098988779809]
We propose SummQ, a novel adversarial multi-agent framework for long document summarization.<n>Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries.<n>We evaluate SummQ on three widely used long document summarization benchmarks.
arXiv Detail & Related papers (2025-09-25T08:36:19Z)
Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper [64.50822834679101]
SciIG is a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works.<n>We assess five state-of-the-art models, including open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems.<n>Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic similarity and faithfulness.
arXiv Detail & Related papers (2025-08-19T21:11:11Z)
Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim $\ ightarrow$ Evidence Reasoning [6.043212666944194]
We present CLAIM-BENCH, a benchmark for evaluating large language models' capabilities in scientific claim-evidence extraction and validation.<n>We show that closed-source models like GPT-4 and Claude consistently outperform open-source counterparts in precision and recall.<n> strategically designed three-pass and one-by-one prompting approaches significantly improve LLMs' abilities to accurately link dispersed evidence with claims.
arXiv Detail & Related papers (2025-06-09T21:04:39Z)
Large Language Models Penetration in Scholarly Writing and Peer Review [43.600778691549706]
We evaluate the penetration of Large Language Models across academic perspectives and dimensions.<n>Our experiments demonstrate the effectiveness of textttLLMetrica, revealing the increasing role of LLMs in scholarly processes.<n>These findings emphasize the need for transparency, accountability, and ethical practices in LLM usage to maintain academic credibility.
arXiv Detail & Related papers (2025-02-16T16:37:34Z)
Practical Considerations for Agentic LLM Systems [5.455744338342196]
This paper frames actionable insights and considerations from the research community in the context of established application paradigms.<n> Namely, we position relevant research findings into four broad categories--Planning, Memory Tools, and Control Flow--based on common practices in application-focused literature.
arXiv Detail & Related papers (2024-12-05T11:57:49Z)
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML) This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature. The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
Acceleron: A Tool to Accelerate Research Ideation [15.578814192003437]
Acceleron is a research accelerator for different phases of the research life cycle. It guides researchers through the formulation of a comprehensive research proposal, encompassing a novel research problem. We leverage the reasoning and domain-specific skills of Large Language Models (LLMs) to create an agent-based architecture.
arXiv Detail & Related papers (2024-03-07T10:20:06Z)
Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers [33.3564201174124]
We investigate the utility of modern large language models in assisting professional writers via an empirical user study. We find that while writers seek LLM's help across all three types of cognitive activities, they find LLMs more helpful in translation and reviewing.
arXiv Detail & Related papers (2023-09-22T01:49:36Z)
Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of supervised fine-tuning (SFT)<n>We also review the potential pitfalls of SFT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies.
arXiv Detail & Related papers (2023-08-21T15:35:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.