SMUDLP: Self-Teaching Multi-Frame Unsupervised Endoscopic Depth
Estimation with Learnable Patchmatch
- URL: http://arxiv.org/abs/2205.15034v1
- Date: Mon, 30 May 2022 12:11:03 GMT
- Title: SMUDLP: Self-Teaching Multi-Frame Unsupervised Endoscopic Depth
Estimation with Learnable Patchmatch
- Authors: Shuwei Shao, Zhongcai Pei, Weihai Chen, Xingming Wu, Zhong Liu,
Zhengguo Li
- Abstract summary: Unsupervised monocular trained depth estimation models make use of adjacent frames as a supervisory signal during the training phase.
temporally correlated frames are also available at inference time for many clinical applications, e.g., surgical navigation.
We present SMUDLP, a novel and unsupervised paradigm for multi-frame monocular endoscopic depth estimation.
- Score: 25.35009126980672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised monocular trained depth estimation models make use of adjacent
frames as a supervisory signal during the training phase. However, temporally
correlated frames are also available at inference time for many clinical
applications, e.g., surgical navigation. The vast majority of monocular systems
do not exploit this valuable signal that could be deployed to enhance the depth
estimates. Those that do, achieve only limited gains due to the unique
challenges in endoscopic scenes, such as low and homogeneous textures and
inter-frame brightness fluctuations. In this work, we present SMUDLP, a novel
and unsupervised paradigm for multi-frame monocular endoscopic depth
estimation. The SMUDLP integrates a learnable patchmatch module to adaptively
increase the discriminative ability in low-texture and homogeneous-texture
regions, and enforces cross-teaching and self-teaching consistencies to provide
efficacious regularizations towards brightness fluctuations. Our detailed
experiments on both SCARED and Hamlyn datasets indicate that the SMUDLP exceeds
state-of-the-art competitors by a large margin, including those that use single
or multiple frames at inference time. The source code and trained models will
be publicly available upon the acceptance.
Related papers
- A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods.
The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics.
We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z) - OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation [55.676358801492114]
We propose OCAI, a method that supports robust frame ambiguities by generating intermediate video frames alongside optical flows in between.
Our evaluations demonstrate superior quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI.
arXiv Detail & Related papers (2024-03-26T20:23:48Z) - Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder.
Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - Assessing Coarse-to-Fine Deep Learning Models for Optic Disc and Cup
Segmentation in Fundus Images [0.0]
coarse-to-fine deep learning algorithms are used to efficiently measure the vertical cup-to-disc ratio (vCDR) in fundus images.
We present a comprehensive analysis of different coarse-to-fine designs for OD/OC segmentation using 5 public databases.
Our analysis shows that these algorithms not necessarily outperfom standard multi-class single-stage models.
arXiv Detail & Related papers (2022-09-28T19:19:16Z) - Anomaly Detection in Retinal Images using Multi-Scale Deep Feature
Sparse Coding [30.097208168480826]
We introduce an unsupervised approach for detecting anomalies in retinal images to overcome this issue.
We achieve relative AUC score improvement of 7.8%, 6.7% and 12.1% over state-of-the-art SPADE on Eye-Q, IDRiD and OCTID datasets respectively.
arXiv Detail & Related papers (2022-01-27T13:36:22Z) - Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy:
Appearance Flow to the Rescue [38.168759071532676]
Self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos.
In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem.
We build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes.
arXiv Detail & Related papers (2021-12-15T13:51:10Z) - Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via
Bayesian Deep Learning [7.535751594024775]
Retinopathy represents a group of retinal diseases that, if not treated timely, can cause severe visual impairments or even blindness.
This paper presents a novel incremental cross-domain adaptation instrument that allows any deep classification model to progressively learn abnormal retinal pathologies.
The proposed framework, evaluated on six public datasets, outperforms the state-of-the-art competitors by achieving an overall accuracy and F1 score of 0.9826 and 0.9846, respectively.
arXiv Detail & Related papers (2021-10-18T13:45:21Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.