Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews
- URL: http://arxiv.org/abs/2407.10652v1
- Date: Mon, 15 Jul 2024 12:13:53 GMT
- Title: Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews
- Authors: Lucas Joos, Daniel A. Keim, Maximilian T. Fischer,
- Abstract summary: Large Language Models (LLMs) can be used to enhance the efficiency, speed, and precision of literature review filtering.
We show that using advanced LLMs with simple prompting can significantly reduce the time required for literature filtering.
We also show that false negatives can indeed be controlled through a consensus scheme, achieving recalls >98.8% at or even beyond the typical human error threshold.
- Score: 7.355182982314533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In academic research, systematic literature reviews are foundational and highly relevant, yet tedious to create due to the high volume of publications and labor-intensive processes involved. Systematic selection of relevant papers through conventional means like keyword-based filtering techniques can sometimes be inadequate, plagued by semantic ambiguities and inconsistent terminology, which can lead to sub-optimal outcomes. To mitigate the required extensive manual filtering, we explore and evaluate the potential of using Large Language Models (LLMs) to enhance the efficiency, speed, and precision of literature review filtering, reducing the amount of manual screening required. By using models as classification agents acting on a structured database only, we prevent common problems inherent in LLMs, such as hallucinations. We evaluate the real-world performance of such a setup during the construction of a recent literature survey paper with initially more than 8.3k potentially relevant articles under consideration and compare this with human performance on the same dataset. Our findings indicate that employing advanced LLMs like GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Flash, or Llama3 with simple prompting can significantly reduce the time required for literature filtering - from usually weeks of manual research to only a few minutes. Simultaneously, we crucially show that false negatives can indeed be controlled through a consensus scheme, achieving recalls >98.8% at or even beyond the typical human error threshold, thereby also providing for more accurate and relevant articles selected. Our research not only demonstrates a substantial improvement in the methodology of literature reviews but also sets the stage for further integration and extensive future applications of responsible AI in academic research practices.
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - PROMPTHEUS: A Human-Centered Pipeline to Streamline SLRs with LLMs [0.0]
PROMPTHEUS is an AI-driven pipeline solution for Systematic Literature Reviews.
It automates key stages of the SLR process, including systematic search, data extraction, topic modeling, and summarization.
It achieves high precision, provides coherent topic organization, and reduces review time.
arXiv Detail & Related papers (2024-10-21T13:05:33Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - LLAssist: Simple Tools for Automating Literature Review Using Large Language Models [0.0]
LLAssist is an open-source tool designed to streamline literature reviews in academic research.
It uses Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process.
arXiv Detail & Related papers (2024-07-19T02:48:54Z) - ChatCite: LLM Agent with Human Workflow Guidance for Comparative
Literature Summary [30.409552944905915]
ChatCite is an LLM agent with human workflow guidance for comparative literature summary.
The ChatCite agent outperformed other models in various dimensions in the experiments.
The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.
arXiv Detail & Related papers (2024-03-05T01:13:56Z) - Bridging Research and Readers: A Multi-Modal Automated Academic Papers
Interpretation System [47.13932723910289]
We introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages.
It employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately.
It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section.
It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs.
arXiv Detail & Related papers (2024-01-17T11:50:53Z) - Streamlining the Selection Phase of Systematic Literature Reviews (SLRs) Using AI-Enabled GPT-4 Assistant API [0.0]
This study introduces a pioneering AI-based tool, configured specifically to streamline the efficiency of the article selection phase in Systematic Literature Reviews.
The tool successfully homogenizes the article selection process across a broad array of academic disciplines.
arXiv Detail & Related papers (2024-01-14T11:16:16Z) - Zero-shot Generative Large Language Models for Systematic Review
Screening Automation [55.403958106416574]
This study investigates the effectiveness of using zero-shot large language models for automatic screening.
We evaluate the effectiveness of eight different LLMs and investigate a calibration technique that uses a predefined recall threshold.
arXiv Detail & Related papers (2024-01-12T01:54:08Z) - Can large language models replace humans in the systematic review
process? Evaluating GPT-4's efficacy in screening and extracting data from
peer-reviewed and grey literature in multiple languages [0.0]
This study evaluates GPT-4's capability in title/abstract screening, full-text review, and data extraction using a 'human-out-of-the-loop' approach.
GPT-4 had accuracy on par with human performance in most tasks, but results were skewed by chance agreement and dataset imbalance.
When screening full-text literature using highly reliable prompts, GPT-4's performance was 'almost perfect'
arXiv Detail & Related papers (2023-10-26T16:18:30Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Large Language Models are Not Yet Human-Level Evaluators for Abstractive
Summarization [66.08074487429477]
We investigate the stability and reliability of large language models (LLMs) as automatic evaluators for abstractive summarization.
We find that while ChatGPT and GPT-4 outperform the commonly used automatic metrics, they are not ready as human replacements.
arXiv Detail & Related papers (2023-05-22T14:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.