Related papers: EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels

EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels

URL: http://arxiv.org/abs/2409.05442v4
Date: Mon, 06 Jan 2025 02:59:27 GMT
Title: EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels
Authors: Qingyao Tian, Zhen Chen, Huai Liao, Xinyan Huang, Lujie Li, Sebastien Ourselin, Hongbin Liu,
Abstract summary: Single-image depth estimation is essential for endoscopy tasks such as localization, reconstruction, and augmented reality.<n>Most existing methods in surgical scenes focus on in-domain depth estimation, limiting their real-world applicability.<n>We present Endo Omni, the first foundation model for zero-shot cross-domain depth estimation for endoscopy.
Score: 4.99086145037811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Single-image depth estimation is essential for endoscopy tasks such as localization, reconstruction, and augmented reality. Most existing methods in surgical scenes focus on in-domain depth estimation, limiting their real-world applicability. This constraint stems from the scarcity and inferior labeling quality of medical data for training. In this work, we present EndoOmni, the first foundation model for zero-shot cross-domain depth estimation for endoscopy. To harness the potential of diverse training data, we refine the advanced self-learning paradigm that employs a teacher model to generate pseudo-labels, guiding a student model trained on large-scale labeled and unlabeled data. To address training disturbance caused by inherent noise in depth labels, we propose a robust training framework that leverages both depth labels and estimated confidence from the teacher model to jointly guide the student model training. Moreover, we propose a weighted scale-and-shift invariant loss to adaptively adjust learning weights based on label confidence, thus imposing learning bias towards cleaner label pixels while reducing the influence of highly noisy pixels. Experiments on zero-shot relative depth estimation show that our EndoOmni improves state-of-the-art methods in medical imaging for 33\% and existing foundation models for 34\% in terms of absolute relative error on specific datasets. Furthermore, our model provides strong initialization for fine-tuning metric depth estimation, maintaining superior performance in both in-domain and out-of-domain scenarios. The source code is publicly available at https://github.com/TianCuteQY/EndoOmni.

Related papers

EndoMUST: Monocular Depth Estimation for Robotic Endoscopy via End-to-end Multi-step Self-supervised Training [0.7499722271664147]
A novel framework with multistep efficient finetuning is proposed in this work.<n>Based on parameter-efficient finetuning on the foundation model, the proposed method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-06-19T04:31:59Z)
Enhancing Bronchoscopy Depth Estimation through Synthetic-to-Real Domain Adaptation [2.795503750654676]
We propose a transfer learning framework that leverages synthetic data with depth labels for training and adapts domain knowledge for accurate depth estimation in real bronchoscope data. Our network demonstrates improved depth prediction on real footage using domain adaptation compared to training solely on synthetic data, validating our approach.
arXiv Detail & Related papers (2024-11-07T03:48:35Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Tackling the Incomplete Annotation Issue in Universal Lesion Detection Task By Exploratory Training [10.627977735890191]
Universal lesion detection has great value for clinical practice as it aims to detect lesions in multiple organs on medical images. Deep learning methods have shown promising results, but demanding large volumes of annotated data for training. We introduce a teacher-student detection model as basis, where the teacher's predictions are combined with incomplete annotations to train the student.
arXiv Detail & Related papers (2023-09-23T08:44:07Z)
Synthetic Augmentation with Large-scale Unconditional Pre-training [4.162192894410251]
We propose a synthetic augmentation method called HistoDiffusion to reduce the dependency on annotated data. HistoDiffusion can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets.
arXiv Detail & Related papers (2023-08-08T03:34:04Z)
Learning with Noisy Labels through Learnable Weighting and Centroid Similarity [5.187216033152917]
noisy labels are prevalent in domains such as medical diagnosis and autonomous driving. We introduce a novel method for training machine learning models in the presence of noisy labels. Our results show that our method consistently outperforms the existing state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-16T16:43:24Z)
Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z)
SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
Adapting the Mean Teacher for keypoint-based lung registration under geometric domain shifts [75.51482952586773]
deep neural networks generally require plenty of labeled training data and are vulnerable to domain shifts between training and test data. We present a novel approach to geometric domain adaptation for image registration, adapting a model from a labeled source to an unlabeled target domain. Our method consistently improves on the baseline model by 50%/47% while even matching the accuracy of models trained on target data.
arXiv Detail & Related papers (2022-07-01T12:16:42Z)
Pre-training via Denoising for Molecular Property Prediction [53.409242538744444]
We describe a pre-training technique that utilizes large datasets of 3D molecular structures at equilibrium. Inspired by recent advances in noise regularization, our pre-training objective is based on denoising.
arXiv Detail & Related papers (2022-05-31T22:28:34Z)
Deep Semi-supervised Knowledge Distillation for Overlapping Cervical Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation. We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining. Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z)
A generic ensemble based deep convolutional neural network for semi-supervised medical image segmentation [7.141405427125369]
We propose a generic semi-supervised learning framework for image segmentation based on a deep convolutional neural network (DCNN) Our method is able to significantly improve beyond fully supervised model learning by incorporating unlabeled data.
arXiv Detail & Related papers (2020-04-16T23:41:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.