Multitask Learning in Minimally Invasive Surgical Vision: A Review
- URL: http://arxiv.org/abs/2401.08256v1
- Date: Tue, 16 Jan 2024 10:18:57 GMT
- Title: Multitask Learning in Minimally Invasive Surgical Vision: A Review
- Authors: Oluwatosin Alabi, Tom Vercauteren, Miaojing Shi
- Abstract summary: Minimally invasive surgery (MIS) has revolutionized many procedures and led to reduced recovery time and risk of patient injury.
Data-driven surgical vision algorithms are thought to be key building blocks in the development of future MIS systems with improved autonomy.
Recent advancements in machine learning and computer vision have led to successful applications in analyzing videos obtained from MIS with the promise of alleviating challenges in MIS videos.
- Score: 12.325297234992076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Minimally invasive surgery (MIS) has revolutionized many procedures and led
to reduced recovery time and risk of patient injury. However, MIS poses
additional complexity and burden on surgical teams. Data-driven surgical vision
algorithms are thought to be key building blocks in the development of future
MIS systems with improved autonomy. Recent advancements in machine learning and
computer vision have led to successful applications in analyzing videos
obtained from MIS with the promise of alleviating challenges in MIS videos.
Surgical scene and action understanding encompasses multiple related tasks
that, when solved individually, can be memory-intensive, inefficient, and fail
to capture task relationships. Multitask learning (MTL), a learning paradigm
that leverages information from multiple related tasks to improve performance
and aid generalization, is wellsuited for fine-grained and high-level
understanding of MIS data. This review provides an overview of the current
state-of-the-art MTL systems that leverage videos obtained from MIS. Beyond
listing published approaches, we discuss the benefits and limitations of these
MTL systems. Moreover, this manuscript presents an analysis of the literature
for various application fields of MTL in MIS, including those with large
models, highlighting notable trends, new directions of research, and
developments.
Related papers
- RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models [12.841405829775852]
We introduce the modality importance score (MIS) to identify bias inVidQA benchmarks and datasets.
We also propose an innovative method using state-of-the-art MLLMs to estimate the modality importance.
Our results indicate that current models do not effectively integrate information due to modality imbalance in existing datasets.
arXiv Detail & Related papers (2024-08-22T23:32:42Z) - A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems.
This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z) - A Multivocal Review of MLOps Practices, Challenges and Open Issues [9.227450931458907]
We conduct a Multivocal Literature Review (MLR) of 150 relevant academic studies and 48 gray literature to provide a comprehensive body of knowledge on MLOps.
We identify the emerging MLOps practices, adoption challenges and solutions related to various areas, including development and operation of complex pipelines, managing production at scale, managing artifacts, and ensuring quality, security, governance, and ethical aspects.
arXiv Detail & Related papers (2024-06-14T05:47:13Z) - Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models [87.47400128150032]
We propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement.
Lumen first promotes fine-grained vision-language concept alignment.
Then the task-specific decoding is carried out by flexibly routing the shared representation to lightweight task decoders.
arXiv Detail & Related papers (2024-03-12T04:13:45Z) - Incorporating Visual Experts to Resolve the Information Loss in
Multimodal Large Language Models [121.83413400686139]
This paper proposes to improve the visual perception ability of MLLMs through a mixture-of-experts knowledge enhancement mechanism.
We introduce a novel method that incorporates multi-task encoders and visual tools into the existing MLLMs training and inference pipeline.
arXiv Detail & Related papers (2024-01-06T02:02:34Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - A Comprehensive Evaluation of Multi-task Learning and Multi-task
Pre-training on EHR Time-series Data [0.0]
Multi-task learning (MTL) is a machine learning technique aiming to improve model performance by leveraging information across many tasks.
In this work, we examine MTL across a battery of tasks on EHR time-series data.
We find that while MTL does suffer from common negative transfer, we can realize significant gains via MTL pre-training combined with single-task fine-tuning.
arXiv Detail & Related papers (2020-07-20T15:19:28Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.