Multitask Learning in Minimally Invasive Surgical Vision: A Review
- URL: http://arxiv.org/abs/2401.08256v1
- Date: Tue, 16 Jan 2024 10:18:57 GMT
- Title: Multitask Learning in Minimally Invasive Surgical Vision: A Review
- Authors: Oluwatosin Alabi, Tom Vercauteren, Miaojing Shi
- Abstract summary: Minimally invasive surgery (MIS) has revolutionized many procedures and led to reduced recovery time and risk of patient injury.
Data-driven surgical vision algorithms are thought to be key building blocks in the development of future MIS systems with improved autonomy.
Recent advancements in machine learning and computer vision have led to successful applications in analyzing videos obtained from MIS with the promise of alleviating challenges in MIS videos.
- Score: 12.325297234992076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Minimally invasive surgery (MIS) has revolutionized many procedures and led
to reduced recovery time and risk of patient injury. However, MIS poses
additional complexity and burden on surgical teams. Data-driven surgical vision
algorithms are thought to be key building blocks in the development of future
MIS systems with improved autonomy. Recent advancements in machine learning and
computer vision have led to successful applications in analyzing videos
obtained from MIS with the promise of alleviating challenges in MIS videos.
Surgical scene and action understanding encompasses multiple related tasks
that, when solved individually, can be memory-intensive, inefficient, and fail
to capture task relationships. Multitask learning (MTL), a learning paradigm
that leverages information from multiple related tasks to improve performance
and aid generalization, is wellsuited for fine-grained and high-level
understanding of MIS data. This review provides an overview of the current
state-of-the-art MTL systems that leverage videos obtained from MIS. Beyond
listing published approaches, we discuss the benefits and limitations of these
MTL systems. Moreover, this manuscript presents an analysis of the literature
for various application fields of MTL in MIS, including those with large
models, highlighting notable trends, new directions of research, and
developments.
Related papers
- A Multivocal Review of MLOps Practices, Challenges and Open Issues [9.227450931458907]
We conduct a Multivocal Literature Review (MLR) of 150 relevant academic studies and 48 gray literature to provide a comprehensive body of knowledge on MLOps.
We identify the emerging MLOps practices, adoption challenges and solutions related to various areas, including development and operation of complex pipelines, managing production at scale, managing artifacts, and ensuring quality, security, governance, and ethical aspects.
arXiv Detail & Related papers (2024-06-14T05:47:13Z) - Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models [87.47400128150032]
We propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement.
Lumen first promotes fine-grained vision-language concept alignment.
Then the task-specific decoding is carried out by flexibly routing the shared representation to lightweight task decoders.
arXiv Detail & Related papers (2024-03-12T04:13:45Z) - Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives [56.2139730920855]
We present a systematic analysis of MM-VUFMs specifically designed for road scenes.
Our objective is to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques.
We provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models.
arXiv Detail & Related papers (2024-02-05T12:47:09Z) - Incorporating Visual Experts to Resolve the Information Loss in
Multimodal Large Language Models [121.83413400686139]
This paper proposes to improve the visual perception ability of MLLMs through a mixture-of-experts knowledge enhancement mechanism.
We introduce a novel method that incorporates multi-task encoders and visual tools into the existing MLLMs training and inference pipeline.
arXiv Detail & Related papers (2024-01-06T02:02:34Z) - Visual AI and Linguistic Intelligence Through Steerability and
Composability [0.0]
This study explores the capabilities of multimodal large language models (LLMs) in handling challenging multistep tasks that integrate language and vision.
The research presents a series of 14 creatively and constructively diverse tasks, ranging from AI Lego Designing to AI Satellite Image Analysis.
arXiv Detail & Related papers (2023-11-18T22:01:33Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning [42.68425777473114]
Vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity.
We introduce vision-language Model with Multi-Modal In-Context Learning (MMICL), a new approach to allow the VLM to deal with multi-modal inputs efficiently.
Our experiments confirm that MMICL achieves new state-of-the-art zero-shot performance on a wide range of general vision-language tasks.
arXiv Detail & Related papers (2023-09-14T17:59:17Z) - Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications.
We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
arXiv Detail & Related papers (2023-08-21T15:35:16Z) - Improving Multi-task Learning via Seeking Task-based Flat Regions [43.85516379095757]
Multi-Task Learning (MTL) is a powerful learning paradigm for training deep neural networks that allows learning more than one objective by a single backbone.
There is an emerging line of work in MTL that focuses on manipulating the task gradient to derive an ultimate gradient descent direction.
We propose to leverage a recently introduced training method, named Sharpness-aware Minimization, which can enhance model generalization ability on single-task learning.
arXiv Detail & Related papers (2022-11-24T17:19:30Z) - A Comprehensive Evaluation of Multi-task Learning and Multi-task
Pre-training on EHR Time-series Data [0.0]
Multi-task learning (MTL) is a machine learning technique aiming to improve model performance by leveraging information across many tasks.
In this work, we examine MTL across a battery of tasks on EHR time-series data.
We find that while MTL does suffer from common negative transfer, we can realize significant gains via MTL pre-training combined with single-task fine-tuning.
arXiv Detail & Related papers (2020-07-20T15:19:28Z) - Multi-Task Learning for Dense Prediction Tasks: A Survey [87.66280582034838]
Multi-task learning (MTL) techniques have shown promising results w.r.t. performance, computations and/or memory footprint.
We provide a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision.
arXiv Detail & Related papers (2020-04-28T09:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.