AISysRev -- LLM-based Tool for Title-abstract Screening
- URL: http://arxiv.org/abs/2510.06708v1
- Date: Wed, 08 Oct 2025 06:59:23 GMT
- Title: AISysRev -- LLM-based Tool for Title-abstract Screening
- Authors: Aleksi Huotala, Miikka Kuutila, Olli-Pekka Turtio, Mika Mäntylä,
- Abstract summary: AiSysRev is a web application running in a Docker container for screening papers.<n>It accepts a CSV file containing paper titles and abstracts.<n>Users specify inclusion and exclusion criteria.<n>It supports both zero-shot and few-shot screening.
- Score: 0.7758046038799246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Systematic reviews are a standard practice for summarizing the state of evidence in software engineering. Conducting systematic reviews is laborious, especially during the screening or study selection phase, where the number of papers can be overwhelming. During this phase, papers are assessed against inclusion and exclusion criteria based on their titles and abstracts. Recent research has demonstrated that large language models (LLMs) can perform title-abstract screening at a level comparable to that of a master's student. While LLMs cannot be fully trusted, they can help, for example, in Rapid Reviews, which try to expedite the review process. Building on recent research, we developed AiSysRev, an LLM-based screening tool implemented as a web application running in a Docker container. The tool accepts a CSV file containing paper titles and abstracts. Users specify inclusion and exclusion criteria. One can use multiple LLMs for screening via OpenRouter. AiSysRev supports both zero-shot and few-shot screening, and also allows for manual screening through interfaces that display LLM results as guidance for human reviewers.We conducted a trial study with 137 papers using the tool. Our findings indicate that papers can be classified into four categories: Easy Includes, Easy Excludes, Boundary Includes, and Boundary Excludes. The Boundary cases, where LLMs are prone to errors, highlight the need for human intervention. While LLMs do not replace human judgment in systematic reviews, they can significantly reduce the burden of assessing large volumes of scientific literature. Video: https://www.youtube.com/watch?v=jVbEj4Y4tQI Tool: https://github.com/EvoTestOps/AISysRev
Related papers
- LLM-REVal: Can We Trust LLM Reviewers Yet? [70.58742663985652]
Large language models (LLMs) have inspired researchers to integrate them extensively into the academic workflow.<n>This study focuses on how the deep integration of LLMs into both peer-review and research processes may influence scholarly fairness.
arXiv Detail & Related papers (2025-10-14T10:30:20Z) - SESR-Eval: Dataset for Evaluating LLMs in the Title-Abstract Screening of Systematic Reviews [0.9421843976231371]
We create a benchmark dataset to evaluate the performance of large language models (LLMs) in the title-abstract screening process of systematic reviews (SRs)<n>We present the SESR-Eval dataset containing 34,528 labeled primary studies, from 24 secondary studies published in software engineering (SE) journals.<n>Our benchmark enables monitoring AI performance in the screening task of SRs in software engineering.
arXiv Detail & Related papers (2025-07-25T07:27:03Z) - LGAR: Zero-Shot LLM-Guided Neural Ranking for Abstract Screening in Systematic Literature Reviews [0.9314555897827079]
Systematic literature reviews aim to identify and evaluate all relevant papers on a topic.<n>To date, abstract screening methods using large language models (LLMs) focus on binary classification settings.<n>We propose LGAR, a zero-shot LLM Guided Abstract Ranker composed of an LLM based graded relevance scorer and a dense re-ranker.
arXiv Detail & Related papers (2025-05-30T16:18:50Z) - AiReview: An Open Platform for Accelerating Systematic Reviews with LLMs [43.1999161587789]
AiReview is a novel platform for LLM-assisted systematic review creation.<n>It is the first of its kind to bridge the gap between cutting-edge LLM-assisted screening methods and those that create medical systematic reviews.
arXiv Detail & Related papers (2025-04-05T14:55:43Z) - LitLLMs, LLMs for Literature Review: Are we there yet? [15.785989492351684]
This paper explores the zero-shot abilities of recent Large Language Models in assisting with the writing of literature reviews based on an abstract.<n>For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper.<n>In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review.
arXiv Detail & Related papers (2024-12-15T01:12:26Z) - Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.<n>The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.<n>We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - Get my drift? Catching LLM Task Drift with Activation Deltas [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.<n>We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.<n>We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews [7.030989629685138]
Large Language Models (LLMs) can accelerate title-abstract screening by simplifying abstracts for human screeners.
We performed an experiment where humans screened titles and abstracts for 20 papers with both original and simplified abstracts from a prior Systematic Review.
We also studied if different prompting techniques (Zero-shot (ZS), One-shot (OS), Few-shot (FS), and Few-shot with Chain-of-Thought (FS-CoT)) improve the screening performance of LLMs.
arXiv Detail & Related papers (2024-04-24T05:53:20Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [111.51612340032052]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.<n>This paper presents the first comprehensive MLLM Evaluation benchmark MME.<n>It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.