Endo-Sim2Real: Consistency learning-based domain adaptation for
instrument segmentation
- URL: http://arxiv.org/abs/2007.11514v1
- Date: Wed, 22 Jul 2020 16:18:11 GMT
- Title: Endo-Sim2Real: Consistency learning-based domain adaptation for
instrument segmentation
- Authors: Manish Sahu, Ronja Str\"omsd\"orfer, Anirban Mukhopadhyay, and Stefan
Zachow
- Abstract summary: Surgical tool segmentation in endoscopic videos is an important component of computer assisted interventions systems.
Recent success of image-based solutions using fully-supervised deep learning approaches can be attributed to the collection of big labeled datasets.
Computer simulations could alleviate the manual labeling problem, however, models trained on simulated data do not generalize to real data.
This work proposes a consistency-based framework for joint learning of simulated and real (unlabeled) endoscopic data.
- Score: 1.086731011437779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical tool segmentation in endoscopic videos is an important component of
computer assisted interventions systems. Recent success of image-based
solutions using fully-supervised deep learning approaches can be attributed to
the collection of big labeled datasets. However, the annotation of a big
dataset of real videos can be prohibitively expensive and time consuming.
Computer simulations could alleviate the manual labeling problem, however,
models trained on simulated data do not generalize to real data. This work
proposes a consistency-based framework for joint learning of simulated and real
(unlabeled) endoscopic data to bridge this performance generalization issue.
Empirical results on two data sets (15 videos of the Cholec80 and EndoVis'15
dataset) highlight the effectiveness of the proposed \emph{Endo-Sim2Real}
method for instrument segmentation. We compare the segmentation of the proposed
approach with state-of-the-art solutions and show that our method improves
segmentation both in terms of quality and quantity.
Related papers
- Match Stereo Videos via Bidirectional Alignment [15.876953256378224]
Recent learning-based methods often focus on optimizing performance for independent stereo pairs, leading to temporal inconsistencies in videos.
We introduce a novel video processing framework, BiDAStereo, and a plugin stabilizer network, BiDAStabilizer, compatible with general image-based methods.
We present a realistic synthetic dataset and benchmark focused on natural scenes, along with a real-world dataset captured by a stereo camera in diverse urban scenes for qualitative evaluation.
arXiv Detail & Related papers (2024-09-30T13:37:29Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS
Instance Segmentation [10.789826145990016]
This paper presents a deep learning framework for medical video segmentation.
Our framework explicitly extracts features from neighbouring frames across the temporal dimension.
It incorporates them with a temporal feature blender, which then tokenises the high-level-temporal feature to form a strong global feature encoded via a Swin Transformer.
arXiv Detail & Related papers (2023-02-22T12:09:39Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Simulation-to-Real domain adaptation with teacher-student learning for
endoscopic instrument segmentation [1.1047993346634768]
We introduce a teacher-student learning approach that learns jointly from annotated simulation data and unlabeled real data.
Empirical results on three datasets highlight the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-03-02T09:30:28Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Unsupervised Learning Consensus Model for Dynamic Texture Videos
Segmentation [12.462608802359936]
We present an effective unsupervised learning consensus model for the segmentation of dynamic texture (ULCM)
In the proposed model, the set of values of the requantized local binary patterns (LBP) histogram around the pixel to be classified are used as features.
Experiments conducted on the challenging SynthDB dataset show that ULCM is significantly faster, easier to code, simple and has limited parameters.
arXiv Detail & Related papers (2020-06-29T16:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.