SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence
- URL: http://arxiv.org/abs/2503.10265v1
- Date: Thu, 13 Mar 2025 11:23:13 GMT
- Title: SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence
- Authors: Chang Han Low, Ziyue Wang, Tianyi Zhang, Zhitao Zeng, Zhu Zhuo, Evangelos B. Mazomenos, Yueming Jin,
- Abstract summary: Integration of Vision-Language Models in surgical intelligence is hindered by hallucinations, domain knowledge gaps, and limited understanding of task interdependencies.<n>We present SurgRAW, a CoT-driven multi-agent framework that delivers transparent, interpretable insights for most tasks in robotic-assisted surgery.
- Score: 16.584722724845182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Integration of Vision-Language Models (VLMs) in surgical intelligence is hindered by hallucinations, domain knowledge gaps, and limited understanding of task interdependencies within surgical scenes, undermining clinical reliability. While recent VLMs demonstrate strong general reasoning and thinking capabilities, they still lack the domain expertise and task-awareness required for precise surgical scene interpretation. Although Chain-of-Thought (CoT) can structure reasoning more effectively, current approaches rely on self-generated CoT steps, which often exacerbate inherent domain gaps and hallucinations. To overcome this, we present SurgRAW, a CoT-driven multi-agent framework that delivers transparent, interpretable insights for most tasks in robotic-assisted surgery. By employing specialized CoT prompts across five tasks: instrument recognition, action recognition, action prediction, patient data extraction, and outcome assessment, SurgRAW mitigates hallucinations through structured, domain-aware reasoning. Retrieval-Augmented Generation (RAG) is also integrated to external medical knowledge to bridge domain gaps and improve response reliability. Most importantly, a hierarchical agentic system ensures that CoT-embedded VLM agents collaborate effectively while understanding task interdependencies, with a panel discussion mechanism promotes logical consistency. To evaluate our method, we introduce SurgCoTBench, the first reasoning-based dataset with structured frame-level annotations. With comprehensive experiments, we demonstrate the effectiveness of proposed SurgRAW with 29.32% accuracy improvement over baseline VLMs on 12 robotic procedures, achieving the state-of-the-art performance and advancing explainable, trustworthy, and autonomous surgical assistance.
Related papers
- CBM-RAG: Demonstrating Enhanced Interpretability in Radiology Report Generation with Multi-Agent RAG and Concept Bottleneck Models [1.7042756021131187]
This paper presents an automated radiology report generation framework that combines Concept Bottleneck Models (CBMs) with a Multi-Agent Retrieval-Augmented Generation (RAG) system.
CBMs map chest X-ray features to human-understandable clinical concepts, enabling transparent disease classification.
RAG system integrates multi-agent collaboration and external knowledge to produce contextually rich, evidence-based reports.
arXiv Detail & Related papers (2025-04-29T16:14:55Z) - Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities [65.66373425605278]
Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events.
Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better classify surgical phases.
This research addresses these gaps, focusing on Robot-Assisted Partial Nephrectomy (RAPN) as a highly non-linear procedure.
arXiv Detail & Related papers (2025-04-26T15:37:22Z) - TAMA: A Human-AI Collaborative Thematic Analysis Framework Using Multi-Agent LLMs for Clinical Interviews [54.35097932763878]
Thematic analysis (TA) is a widely used qualitative approach for uncovering latent meanings in unstructured text data.
Here, we propose TAMA: A Human-AI Collaborative Thematic Analysis framework using Multi-Agent LLMs for clinical interviews.
We demonstrate that TAMA outperforms existing LLM-assisted TA approaches, achieving higher thematic hit rate, coverage, and distinctiveness.
arXiv Detail & Related papers (2025-03-26T15:58:16Z) - Transparent AI: Developing an Explainable Interface for Predicting Postoperative Complications [1.6609516435725236]
We propose an Explainable AI (XAI) framework designed to answer five critical questions.
We incorporated various techniques such as Local Interpretable Model-agnostic Explanations (LIME)
We showcased an XAI interface prototype that adheres to this framework for predicting major postoperative complications.
arXiv Detail & Related papers (2024-04-18T21:01:27Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery [47.47211257890948]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.<n>We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.<n>Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for
COVID-19 Screening With Chest Radiography [70.37371604119826]
Building AI models with trustworthiness is important especially in regulated areas such as healthcare.
Previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions.
We propose a feature learning approach using Vision Transformers, which use an attention-based mechanism.
arXiv Detail & Related papers (2022-07-19T14:55:42Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Surgical Gesture Recognition Based on Bidirectional Multi-Layer
Independently RNN with Explainable Spatial Feature Extraction [10.469989981471254]
We aim to develop an effective surgical gesture recognition approach with an explainable feature extraction process.
A Bidirectional Multi-Layer independently RNN (BML-indRNN) model is proposed in this paper.
To eliminate the black-box effects of DCNN, Gradient-weighted Class Activation Mapping (Grad-CAM) is employed.
Results indicated that the testing accuracy for the suturing task based on our proposed method is 87.13%, which outperforms most of the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-02T12:47:19Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.