One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video
- URL: http://arxiv.org/abs/2103.12988v1
- Date: Wed, 24 Mar 2021 05:02:18 GMT
- Title: One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video
- Authors: Zixu Zhao, Yueming Jin, Bo Lu, Chi-Fai Ng, Qi Dou, Yun-Hui Liu, and
Pheng-Ann Heng
- Abstract summary: MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
- Score: 71.43912903508765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical instrument segmentation in robot-assisted surgery (RAS) - especially
that using learning-based models - relies on the assumption that training and
testing videos are sampled from the same domain. However, it is impractical and
expensive to collect and annotate sufficient data from every new domain. To
greatly increase the label efficiency, we explore a new problem, i.e., adaptive
instrument segmentation, which is to effectively adapt one source model to new
robotic surgical videos from multiple target domains, only given the annotated
instruments in the first frame. We propose MDAL, a meta-learning based dynamic
online adaptive learning scheme with a two-stage framework to fast adapt the
model parameters on the first frame and partial subsequent frames while
predicting the results. MDAL learns the general knowledge of instruments and
the fast adaptation ability through the video-specific meta-learning paradigm.
The added gradient gate excludes the noisy supervision from pseudo masks for
dynamic online adaptation on target videos. We demonstrate empirically that
MDAL outperforms other state-of-the-art methods on two datasets (including a
real-world RAS dataset). The promising performance on ex-vivo scenes also
benefits the downstream tasks such as robot-assisted suturing and camera
control.
Related papers
- VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation [79.00294932026266]
VidMan is a novel framework that employs a two-stage training mechanism to enhance stability and improve data utilization efficiency.
Our framework outperforms state-of-the-art baseline model GR-1 on the CALVIN benchmark, achieving a 11.7% relative improvement, and demonstrates over 9% precision gains on the OXE small-scale dataset.
arXiv Detail & Related papers (2024-11-14T03:13:26Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - XVO: Generalized Visual Odometry via Cross-Modal Self-Training [11.70220331540621]
XVO is a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models.
In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale.
We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube.
arXiv Detail & Related papers (2023-09-28T18:09:40Z) - Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards
Robot-assisted Intubation [15.795665057836636]
This work introduces a virtual dataset generated by the Open Framework Architecture framework to overcome the limited availability of actual endoscopic images.
We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy.
Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models.
arXiv Detail & Related papers (2023-05-19T14:08:15Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.