LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery
- URL: http://arxiv.org/abs/2402.16664v3
- Date: Wed, 23 Oct 2024 16:27:48 GMT
- Title: LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery
- Authors: Yuyang Du, Kexin Chen, Yue Zhan, Chang Han Low, Tao You, Mobarakol Islam, Ziyu Guo, Yueming Jin, Guangyong Chen, Pheng-Ann Heng,
- Abstract summary: Patient data privacy often restricts the availability of old data when updating the model.
Prior CL studies overlooked two vital problems in the surgical domain.
This paper proposes addressing these problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology.
- Score: 57.358568111574314
- License:
- Abstract: Visual question answering (VQA) is crucial for promoting surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical instruments and techniques for various surgeries. However, patient data privacy often restricts the availability of old data when updating the model, necessitating an exemplar-free continual learning (CL) setup. Prior CL studies overlooked two vital problems in the surgical domain: 1) large domain shifts from diverse surgical operations collected from multiple sources, and 2) severe data imbalance arising from the uneven presence of surgical instruments or activities. This paper proposes addressing these problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology. We first develop a new multi-teacher CL framework that leverages a multimodal LLM as the additional teacher. The strong generalization ability of the LLM can bridge the knowledge gap when domain shifts and data imbalances occur. We then put forth a novel data processing method that transforms complex LLM embeddings into logits compatible with our CL framework. We further design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of the old CL model. Finally, to comprehensively test the effectiveness of our proposed method, we have also constructed two new surgical VQA datasets that are largely different from existing ones and could be valuable resources for future research. Extensive experimental results on the tested datasets demonstrate the superiority of our method to other advanced CL schemes.
Related papers
- Demystifying Large Language Models for Medicine: A Primer [50.83806796466396]
Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare.
This tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice.
arXiv Detail & Related papers (2024-10-24T15:41:56Z) - Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks.
ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning [15.646322352232819]
We create a new dataset, Surg-QA, consisting of 102,000 surgical video-instruction pairs.
We propose a novel two-stage question-answer generation pipeline with LLM to learn surgical knowledge.
We train LLaVA-Surg, a novel vision-language conversational assistant capable of answering open-ended questions about surgical videos.
arXiv Detail & Related papers (2024-08-15T07:00:20Z) - Jumpstarting Surgical Computer Vision [2.7396997668655163]
We employ self-supervised learning to flexibly leverage diverse surgical datasets.
We study phase recognition and critical view of safety in laparoscopic cholecystectomy and laparoscopic hysterectomy.
The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks.
arXiv Detail & Related papers (2023-12-10T18:54:16Z) - Revisiting Distillation for Continual Learning on Visual Question
Localized-Answering in Robotic Surgery [20.509915509237818]
The visual-question localized-answering (VQLA) system can serve as a knowledgeable assistant in surgical education.
Deep neural networks (DNNs) suffer from catastrophic forgetting when learning new knowledge.
arXiv Detail & Related papers (2023-07-22T10:35:25Z) - How to Train Your CheXDragon: Training Chest X-Ray Models for Transfer
to Novel Tasks and Healthcare Systems [0.118749525824656]
Self-supervised learning (SSL) enables label efficient training for machine learning models.
In this work, we systematically experiment with a variety of supervised and self-supervised pretraining strategies.
We show that multimodal SSL gives substantial gains over unimodal SSL in performance across new healthcare systems and tasks.
arXiv Detail & Related papers (2023-05-13T22:33:09Z) - Adapter Learning in Pretrained Feature Extractor for Continual Learning
of Diseases [66.27889778566734]
Currently intelligent diagnosis systems lack the ability of continually learning to diagnose new diseases once deployed.
In particular, updating an intelligent diagnosis system with training data of new diseases would cause catastrophic forgetting of old disease knowledge.
An adapter-based Continual Learning framework called ACL is proposed to help effectively learn a set of new diseases.
arXiv Detail & Related papers (2023-04-18T15:01:45Z) - Identification of Cognitive Workload during Surgical Tasks with
Multimodal Deep Learning [20.706268332427157]
An increase in the associated Cognitive Workload (CWL) results from dealing with unexpected and repetitive tasks.
In this paper, a cascade of two machine learning approaches is suggested for the multimodal recognition of CWL.
A Convolutional Neural Network (CNN) uses this information to identify different types of CWL associated to each surgical task.
arXiv Detail & Related papers (2022-09-12T18:29:34Z) - Competence-based Multimodal Curriculum Learning for Medical Report
Generation [98.10763792453925]
We propose a Competence-based Multimodal Curriculum Learning framework ( CMCL) to alleviate the data bias and make best use of available data.
Specifically, CMCL simulates the learning process of radiologists and optimize the model in a step by step manner.
Experiments on the public IU-Xray and MIMIC-CXR datasets show that CMCL can be incorporated into existing models to improve their performance.
arXiv Detail & Related papers (2022-06-24T08:16:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.