Completing A Systematic Review in Hours instead of Months with   Interactive AI Agents
        - URL: http://arxiv.org/abs/2504.14822v2
 - Date: Mon, 02 Jun 2025 17:34:14 GMT
 - Title: Completing A Systematic Review in Hours instead of Months with   Interactive AI Agents
 - Authors: Rui Qiu, Shijie Chen, Yu Su, Po-Yin Yen, Han-Wei Shen, 
 - Abstract summary: We introduce InsightAgent, a human-centered interactive AI agent powered by large language models.<n>InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing.<n>Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs.
 - Score: 21.934330935124866
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   Systematic reviews (SRs) are vital for evidence-based practice in high stakes disciplines, such as healthcare, but are often impeded by intensive labors and lengthy processes that can take months to complete. Due to the high demand for domain expertise, existing automatic summarization methods fail to accurately identify relevant studies and generate high-quality summaries. To that end, we introduce InsightAgent, a human-centered interactive AI agent powered by large language models that revolutionize this workflow. InsightAgent partitions a large literature corpus based on semantics and employs a multi-agent design for more focused processing of literature, leading to significant improvement in the quality of generated SRs. InsightAgent also provides intuitive visualizations of the corpus and agent trajectories, allowing users to effortlessly monitor the actions of the agent and provide real-time feedback based on their expertise. Our user studies with 9 medical professionals demonstrate that the visualization and interaction mechanisms can effectively improve the quality of synthesized SRs by 27.2%, reaching 79.7% of human-written quality. At the same time, user satisfaction is improved by 34.4%. With InsightAgent, it only takes a clinician about 1.5 hours, rather than months, to complete a high-quality systematic review. 
 
       
      
        Related papers
        - AI Agents-as-Judge: Automated Assessment of Accuracy, Consistency,   Completeness and Clarity for Enterprise Documents [0.0]
This study presents a modular, multi-agent system for the automated review of highly structured enterprise business documents using AI agents.<n>It uses modern orchestration tools such as LangChain, CrewAI, TruLens, and Guidance to enable section-by-section evaluation of documents.<n>It achieves 99% information consistency (vs. 92% for humans), halving error and bias rates, and reducing average review time from 30 to 2.5 minutes per document.
arXiv  Detail & Related papers  (2025-06-23T17:46:15Z) - Robot-Gated Interactive Imitation Learning with Adaptive Intervention   Mechanism [48.41735416075536]
Interactive Imitation Learning (IIL) allows agents to acquire desired behaviors through human interventions.<n>We propose the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations.
arXiv  Detail & Related papers  (2025-06-10T18:43:26Z) - ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification   of Research Methodologies [16.90884865239373]
We introduce ResearchCodeAgent, a novel multi-agent system to automate the codification of research methodologies.
The system bridges the gap between high-level research concepts and their practical implementation.
ResearchCodeAgent represents a significant step towards the research implementation process, potentially accelerating the pace of machine learning research.
arXiv  Detail & Related papers  (2025-04-28T07:18:45Z) - An Illusion of Progress? Assessing the Current State of Web Agents [49.76769323750729]
We conduct a comprehensive and rigorous assessment of the current state of web agents.<n>Results depict a very different picture of the competency of current agents, suggesting over-optimism in previously reported results.<n>We introduce Online-Mind2Web, an online evaluation benchmark consisting of 300 diverse and realistic tasks spanning 136 websites.
arXiv  Detail & Related papers  (2025-04-02T05:51:29Z) - TAMA: A Human-AI Collaborative Thematic Analysis Framework Using   Multi-Agent LLMs for Clinical Interviews [54.35097932763878]
Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data.<n>Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-Agent LLMs for clinical interviews.<n>We demonstrate that TAMA outperforms existing LLM-assisted TA approaches, achieving higher thematic hit rate, coverage, and distinctiveness.
arXiv  Detail & Related papers  (2025-03-26T15:58:16Z) - Reinforcing Clinical Decision Support through Multi-Agent Systems and   Ethical AI Governance [0.0]
We compare novel agent system designs that use modular agents to analyze laboratory results, vital signs, and clinical context.<n>We implement our agent system with the eICU database, including running lab analysis, vitals-only interpreters, and contextual reasoners agents.
arXiv  Detail & Related papers  (2025-03-25T05:32:43Z) - VeriLA: A Human-Centered Evaluation Framework for Interpretable   Verification of LLM Agent Failures [3.075266204492352]
Large language model (LLM) agents in compound AI systems often fail to meet human standards, leading to errors that compromise the system's overall performance.<n>This paper introduces a human-centered evaluation framework for verifying LLM Agent failures (VeriLA)<n>VeriLA systematically assesses agent failures to reduce human effort and make these agent failures interpretable to humans.
arXiv  Detail & Related papers  (2025-03-16T21:11:18Z) - Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent   Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments.
We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions.
Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv  Detail & Related papers  (2024-12-20T09:21:15Z) - Using Generative AI and Multi-Agents to Provide Automatic Feedback [4.883570605293337]
This study investigates the use of generative AI and multi-agent systems to provide automatic feedback in educational contexts.
The research addresses a key gap in the field by exploring how multi-agent systems, called AutoFeedback, can improve the quality of GenAI-generated feedback.
arXiv  Detail & Related papers  (2024-11-11T22:27:36Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv  Detail & Related papers  (2024-10-14T17:57:02Z) - Advancing Healthcare Automation: Multi-Agent System for Medical   Necessity Justification [0.0]
This paper explores the application of Multi-Agent System (MAS) that utilize specialized LLM agents to automate Prior Authorization task.
We demonstrate that GPT-4 checklist achieves an accuracy of 86.2% in predicting item-level judgments with evidence, and 95.6% in determining overall checklist judgment.
arXiv  Detail & Related papers  (2024-04-27T18:40:05Z) - 360$^\circ$REA: Towards A Reusable Experience Accumulation with   360° Assessment for Multi-Agent System [71.96888731208838]
We argue that a comprehensive evaluation and accumulating experience from evaluation feedback is an effective approach to improving system performance.<n>We propose Reusable Experience Accumulation with 360$circ$ Assessment (360$circ$REA), a hierarchical multi-agent framework inspired by corporate organizational practices.
arXiv  Detail & Related papers  (2024-04-08T14:43:13Z) - ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models.
Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv  Detail & Related papers  (2023-08-14T15:13:04Z) - Conveying Autonomous Robot Capabilities through Contrasting Behaviour
  Summaries [8.413049356622201]
We present an adaptive search method for efficiently generating contrasting behaviour summaries.
Our results indicate that adaptive search can efficiently identify informative contrasting scenarios that enable humans to accurately select the better performing agent.
arXiv  Detail & Related papers  (2023-04-01T18:20:59Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.