Temporally Constrained Neural Networks (TCNN): A framework for
semi-supervised video semantic segmentation
- URL: http://arxiv.org/abs/2112.13815v1
- Date: Mon, 27 Dec 2021 18:06:12 GMT
- Title: Temporally Constrained Neural Networks (TCNN): A framework for
semi-supervised video semantic segmentation
- Authors: Deepak Alapatt, Pietro Mascagni, Armine Vardazaryan, Alain Garcia,
Nariaki Okamoto, Didier Mutter, Jacques Marescaux, Guido Costamagna, Bernard
Dallemagne, Nicolas Padoy
- Abstract summary: We present Temporally Constrained Neural Networks (TCNN), a semi-supervised framework used for video semantic segmentation of surgical videos.
In this work, we show that autoencoder networks can be used to efficiently provide both spatial and temporal supervisory signals.
We demonstrate that lower-dimensional representations of predicted masks can be leveraged to provide a consistent improvement on both sparsely labeled datasets.
- Score: 5.0754434714665715
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A major obstacle to building models for effective semantic segmentation, and
particularly video semantic segmentation, is a lack of large and well annotated
datasets. This bottleneck is particularly prohibitive in highly specialized and
regulated fields such as medicine and surgery, where video semantic
segmentation could have important applications but data and expert annotations
are scarce. In these settings, temporal clues and anatomical constraints could
be leveraged during training to improve performance. Here, we present
Temporally Constrained Neural Networks (TCNN), a semi-supervised framework used
for video semantic segmentation of surgical videos. In this work, we show that
autoencoder networks can be used to efficiently provide both spatial and
temporal supervisory signals to train deep learning models. We test our method
on a newly introduced video dataset of laparoscopic cholecystectomy procedures,
Endoscapes, and an adaptation of a public dataset of cataract surgeries, CaDIS.
We demonstrate that lower-dimensional representations of predicted masks can be
leveraged to provide a consistent improvement on both sparsely labeled datasets
with no additional computational cost at inference time. Further, the TCNN
framework is model-agnostic and can be used in conjunction with other model
design choices with minimal additional complexity.
Related papers
- TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation [6.013821375459473]
We introduce a novel deep learning architecture for medical image segmentation.
Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets.
arXiv Detail & Related papers (2024-09-05T09:14:03Z) - Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models.
We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z) - A spatio-temporal network for video semantic segmentation in surgical
videos [11.548181453080087]
We propose a novel architecture for modelling temporal relationships in videos.
The proposed model includes a decoder to enable semantic video segmentation.
The proposed decoder can be used on top of any segmentation encoder to improve temporal consistency.
arXiv Detail & Related papers (2023-06-19T16:36:48Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Surgical Skill Assessment via Video Semantic Aggregation [20.396898001950156]
We propose a skill assessment framework, Video Semantic Aggregation (ViSA), which discovers different semantic parts and aggregates them acrosstemporal dimensions.
The explicit discovery of semantic parts provides an explanatory visualization that helps understand the neural network's decisions.
The experiments on two datasets show the competitiveness of ViSA compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-08-04T12:24:01Z) - Anatomy-Constrained Contrastive Learning for Synthetic Segmentation
without Ground-truth [8.513014699605499]
We developed an anatomy-constrained contrastive synthetic segmentation network (AccSeg-Net) to train a segmentation network for a target imaging modality.
We demonstrated successful applications on CBCT, MRI, and PET imaging data, and showed superior segmentation performances as compared to previous methods.
arXiv Detail & Related papers (2021-07-12T14:54:04Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - Weak Supervision in Convolutional Neural Network for Semantic
Segmentation of Diffuse Lung Diseases Using Partially Annotated Dataset [2.239917051803692]
We develop semantic segmentation model for 5 kinds of lung diseases.
DLDs considered in this work are consolidation, ground glass opacity, honeycombing, emphysema, and normal.
We propose a new weak supervision technique that effectively utilizes partially annotated dataset.
arXiv Detail & Related papers (2020-02-27T06:17:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.