Related papers: Completing A Systematic Review in Hours instead of Months with Interactive AI Agents

Completing A Systematic Review in Hours instead of Months with Interactive AI Agents

URL: http://arxiv.org/abs/2504.14822v2
Date: Mon, 02 Jun 2025 17:34:14 GMT
Title: Completing A Systematic Review in Hours instead of Months with Interactive AI Agents
Authors: Rui Qiu, Shijie Chen, Yu Su, Po-Yin Yen, Han-Wei Shen,
Abstract summary: We introduce InsightAgent, a human-centered interactive AI agent powered by large language models.<n>InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing.<n>Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs.
Score: 21.934330935124866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Systematic reviews (SRs) are vital for evidence-based practice in high stakes disciplines, such as healthcare, but are often impeded by intensive labors and lengthy processes that can take months to complete. Due to the high demand for domain expertise, existing automatic summarization methods fail to accurately identify relevant studies and generate high-quality summaries. To that end, we introduce InsightAgent, a human-centered interactive AI agent powered by large language models that revolutionize this workflow. InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing of literature, leading to significant improvement in the quality of generated SRs. InsightAgent also provides intuitive visualizations of the corpus and agent trajectories, allowing users to effortlessly monitor the actions of the agent and provide real-time feedback based on their expertise. Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs by 27.2%, reaching 79.7% of human-written quality. At the same time, user satisfaction is improved by 34.4%. With InsightAgent, it only takes a clinician about 1.5 hours, rather than months, to complete a high-quality systematic review.

Related papers

AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency, Completeness and Clarity for Enterprise Documents [0.0]
This study presents a modular, multi-agent system for the automated review of highly structured enterprise business documents using AI agents.<n>It uses modern orchestration tools such as LangChain, CrewAI, TruLens, and Guidance to enable section-by-section evaluation of documents.<n>It achieves 99% information consistency (vs. 92% for humans), halving error and bias rates, and reducing average review time from 30 to 2.5 minutes per document.
arXiv Detail & Related papers (2025-06-23T17:46:15Z)
Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism [48.41735416075536]
Interactive Imitation Learning (IIL) allows agents to acquire desired behaviors through human interventions.<n>We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations.
arXiv Detail & Related papers (2025-06-10T18:43:26Z)
ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies [16.90884865239373]
We introduce ResearchCodeAgent, a novel multi-agent system to automate the codification of research methodologies. The system bridges the gap between high-level research concepts and their practical implementation. ResearchCodeAgent represents a significant step towards the research implementation process, potentially accelerating the pace of machine learning research.
arXiv Detail & Related papers (2025-04-28T07:18:45Z)
An Illusion of Progress? Assessing the Current State of Web Agents [49.76769323750729]
We conduct a comprehensive and rigorous assessment of the current state of web agents.<n>Results depict a very different picture of the competency of current agents, suggesting over-optimism in previously reported results.<n>We introduce Online-Mind2Web, an online evaluation benchmark consisting of 300 diverse and realistic tasks spanning 136 websites.
arXiv Detail & Related papers (2025-04-02T05:51:29Z)
TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews [54.35097932763878]
Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data.<n>Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-Agent LLMs for clinical interviews.<n>We demonstrate that TAMA outperforms existing LLM-assisted TA approaches, achieving higher thematic hit rate, coverage, and distinctiveness.
arXiv Detail & Related papers (2025-03-26T15:58:16Z)
Reinforcing Clinical Decision Support through Multi-Agent Systems and Ethical AI Governance [0.0]
We compare novel agent system designs that use modular agents to analyze laboratory results, vital signs, and clinical context.<n>We implement our agent system with the eICU database, including running lab analysis, vitals-only interpreters, and contextual reasoners agents.
arXiv Detail & Related papers (2025-03-25T05:32:43Z)
VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures [3.075266204492352]
Large language model (LLM) agents in compound AI systems often fail to meet human standards, leading to errors that compromise the system's overall performance.<n>This paper introduces a human-centered evaluation framework for verifying LLM Agent failures (VeriLA)<n>VeriLA systematically assesses agent failures to reduce human effort and make these agent failures interpretable to humans.
arXiv Detail & Related papers (2025-03-16T21:11:18Z)
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments. We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions. Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z)
Using Generative AI and Multi-Agents to Provide Automatic Feedback [4.883570605293337]
This study investigates the use of generative AI and multi-agent systems to provide automatic feedback in educational contexts. The research addresses a key gap in the field by exploring how multi-agent systems, called AutoFeedback, can improve the quality of GenAI-generated feedback.
arXiv Detail & Related papers (2024-11-11T22:27:36Z)
Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z)
Advancing Healthcare Automation: Multi-Agent System for Medical Necessity Justification [0.0]
This paper explores the application of Multi-Agent System (MAS) that utilize specialized LLM agents to automate Prior Authorization task. We demonstrate that GPT-4 checklist achieves an accuracy of 86.2% in predicting item-level judgments with evidence, and 95.6% in determining overall checklist judgment.
arXiv Detail & Related papers (2024-04-27T18:40:05Z)
360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System [71.96888731208838]
We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance.<n>We propose Reusable Experience Accumulation with 360$circ$ Assessment (360$circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices.
arXiv Detail & Related papers (2024-04-08T14:43:13Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Conveying Autonomous Robot Capabilities through Contrasting Behaviour Summaries [8.413049356622201]
We present an adaptive search method for efficiently generating contrasting behaviour summaries. Our results indicate that adaptive search can efficiently identify informative contrasting scenarios that enable humans to accurately select the better performing agent.
arXiv Detail & Related papers (2023-04-01T18:20:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.