Accelerating Discovery: Rapid Literature Screening with LLMs
- URL: http://arxiv.org/abs/2509.13103v1
- Date: Tue, 16 Sep 2025 14:01:44 GMT
- Title: Accelerating Discovery: Rapid Literature Screening with LLMs
- Authors: Santiago Matalonga, Domenico Amalfitano, Jean Carlo Rossa Hauck, MartÃn Solari, Guilherme H. Travassos,
- Abstract summary: Researchers must review and filter a large number of unstructured sources, which frequently contain sparse information.<n>We developed a Large Language Model (LLM) assistant to support the search and filtering of documents.
- Score: 1.2586771241101986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Conducting Multi Vocal Literature Reviews (MVLRs) is often time and effort-intensive. Researchers must review and filter a large number of unstructured sources, which frequently contain sparse information and are unlikely to be included in the final study. Our experience conducting an MVLR on Context-Aware Software Systems (CASS) Testing in the avionics domain exemplified this challenge, with over 8,000 highly heterogeneous documents requiring review. Therefore, we developed a Large Language Model (LLM) assistant to support the search and filtering of documents. Aims: To develop and validate an LLM based tool that can support researchers in performing the search and filtering of documents for an MVLR without compromising the rigor of the research protocol. Method: We applied sound engineering practices to develop an on-premises LLM-based tool incorporating Retrieval Augmented Generation (RAG) to process candidate sources. Progress towards the aim was quantified using the Positive Percent Agreement (PPA) as the primary metric to ensure the performance of the LLM based tool. Convenience sampling, supported by human judgment and statistical sampling, were used to verify and validate the tool's quality-in-use. Results: The tool currently demonstrates a PPA agreement with human researchers of 90% for sources that are not relevant to the study. Development details are shared to support domain-specific adaptation of the tool. Conclusions: Using LLM-based tools to support academic researchers in rigorous MVLR is feasible. These tools can free valuable time for higher-level, abstract tasks. However, researcher participation remains essential to ensure that the tool supports thorough research.
Related papers
- LongDA: Benchmarking LLM Agents for Long-Document Data Analysis [55.32211515932351]
LongDA targets real-world settings in which navigating long documentation and complex data is the primary bottleneck.<n>LongTA is a tool-augmented agent framework that enables document access, retrieval, and code execution.<n>Our experiments reveal substantial performance gaps even among state-of-the-art models.
arXiv Detail & Related papers (2026-01-05T23:23:16Z) - Software Testing with Large Language Models: An Interview Study with Practitioners [2.198430261120653]
The use of large language models in software testing is growing fast as they support numerous tasks.<n>However, their adoption often relies on informal experimentation rather than structured guidance.<n>This study investigates how software testing professionals use LLMs in practice to propose a preliminary, practitioner-informed guideline.
arXiv Detail & Related papers (2025-10-20T05:06:56Z) - Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews [5.911820207772152]
We propose a pipeline leveraging multiple large language models (LLMs), classifying papers based on descriptive prompts and deciding jointly.<n>The entire process is human-supervised and interactively controlled via our open-source visual analytics web interface, LLMSurver.<n>Results demonstrate that our pipeline significantly reduces manual effort while achieving lower error rates than single human annotators.
arXiv Detail & Related papers (2025-10-13T13:48:29Z) - PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature [11.804526152911386]
We propose PaperArena, an evaluation benchmark for large language model (LLM) based agents to address real-world research questions.<n>Given a research question, agents should integrate diverse formats across multiple papers through reasoning and interacting with appropriate tools.<n> Experimental results reveal that even the most advanced LLM powering a well-established agent system achieves merely 38.78% average accuracy.
arXiv Detail & Related papers (2025-10-13T02:10:39Z) - MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research [70.72318131988102]
MLR-Bench is a comprehensive benchmark for evaluating AI agents on open-ended machine learning research.<n>MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLM-based reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing.
arXiv Detail & Related papers (2025-05-26T13:18:37Z) - Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning [63.31585771716123]
Large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL)<n>We introduce Tool-Star, an RL-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasoning.<n>Tool-Star integrates six types of tools and incorporates systematic designs in both data synthesis and training.
arXiv Detail & Related papers (2025-05-22T09:00:19Z) - A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering [12.504438766461027]
Large Language Models (LLMs) have transformed Software Engineering (SE) by providing innovative methods for analyzing software repositories.<n>Our research packages a framework, coined Prompt Refinement and Insights for Mining Empirical Software repositories (PRIMES)<n>Our findings indicate that standardizing prompt engineering and using PRIMES can enhance the reliability and accuracy of studies utilizing LLMs.
arXiv Detail & Related papers (2024-11-15T06:08:57Z) - From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions [60.733557487886635]
This paper focuses on bridging the comprehension gap between Large Language Models and external tools.<n>We propose a novel framework, DRAFT, aimed at Dynamically Refining tool documentation.<n>This methodology pivots on an innovative trial-and-error approach, consisting of three distinct learning phases.
arXiv Detail & Related papers (2024-10-10T17:58:44Z) - SWARM-SLR -- Streamlined Workflow Automation for Machine-actionable Systematic Literature Reviews [0.4915744683251149]
We propose the Streamlined Automation for Machine-actionable Systematic Literature Reviews (SWARM- SLR) to crowdsource the improvement of SLR efficiency.
By guidelines from the literature, we have composed a set of 65 requirements, spanning from planning to reporting a review.
Existing tools were assessed against these requirements and synthesized into the SWARM- SLR workflow prototype, a ready-for-operation software support tool.
arXiv Detail & Related papers (2024-07-26T10:46:14Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
Real-world systems often incorporate a wide array of tools, making it impractical to input all tools into Large Language Models.
Existing tool retrieval methods primarily focus on semantic matching between user queries and tool descriptions.
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT, which captures not only the semantic similarities between user queries and tool descriptions but also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - LLMDet: A Third Party Large Language Models Generated Text Detection
Tool [119.0952092533317]
Large language models (LLMs) are remarkably close to high-quality human-authored text.
Existing detection tools can only differentiate between machine-generated and human-authored text.
We propose LLMDet, a model-specific, secure, efficient, and extendable detection tool.
arXiv Detail & Related papers (2023-05-24T10:45:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.