Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video
- URL: http://arxiv.org/abs/2408.10153v1
- Date: Mon, 19 Aug 2024 17:02:16 GMT
- Title: Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video
- Authors: Shuxian Wang, Akshay Paruchuri, Zhaoxi Zhang, Sarah McGill, Roni Sengupta,
- Abstract summary: We propose a pipeline of structure-preserving synthetic-to-real (sim2real) image translation.
This allows us to generate large quantities of realistic-looking synthetic images for supervised depth estimation.
We also propose a dataset of hand-picked sequences from clinical colonoscopies to improve the image translation process.
- Score: 1.0485739694839669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation in colonoscopy video aims to overcome the unusual lighting properties of the colonoscopic environment. One of the major challenges in this area is the domain gap between annotated but unrealistic synthetic data and unannotated but realistic clinical data. Previous attempts to bridge this domain gap directly target the depth estimation task itself. We propose a general pipeline of structure-preserving synthetic-to-real (sim2real) image translation (producing a modified version of the input image) to retain depth geometry through the translation process. This allows us to generate large quantities of realistic-looking synthetic images for supervised depth estimation with improved generalization to the clinical domain. We also propose a dataset of hand-picked sequences from clinical colonoscopies to improve the image translation process. We demonstrate the simultaneous realism of the translated images and preservation of depth maps via the performance of downstream depth estimation on various datasets.
Related papers
- Enhancing Bronchoscopy Depth Estimation through Synthetic-to-Real Domain Adaptation [2.795503750654676]
We propose a transfer learning framework that leverages synthetic data with depth labels for training and adapts domain knowledge for accurate depth estimation in real bronchoscope data.
Our network demonstrates improved depth prediction on real footage using domain adaptation compared to training solely on synthetic data, validating our approach.
arXiv Detail & Related papers (2024-11-07T03:48:35Z) - ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation [67.22294293695255]
We propose a novel reconstruction pipeline with a bi-directional adaptation architecture named ToDER to get precise depth estimations.
Experimental results demonstrate that our approach can precisely predict depth maps in both realistic and synthetic colonoscopy videos.
arXiv Detail & Related papers (2024-07-23T14:24:26Z) - Q-SLAM: Quadric Representations for Monocular SLAM [85.82697759049388]
We reimagine volumetric representations through the lens of quadrics.
We use quadric assumption to rectify noisy depth estimations from RGB inputs.
We introduce a novel quadric-decomposed transformer to aggregate information across quadrics.
arXiv Detail & Related papers (2024-03-12T23:27:30Z) - Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - Leveraging a realistic synthetic database to learn Shape-from-Shading
for estimating the colon depth in colonoscopy images [0.20482269513546453]
This work introduces a novel methodology to estimate colon depth maps in single frames from monocular colonoscopy videos.
The generated depth map is inferred from the shading variation of the colon wall with respect to the light source.
A classic convolutional neural network architecture is trained from scratch to estimate the depth map.
arXiv Detail & Related papers (2023-11-08T21:14:56Z) - SoftEnNet: Symbiotic Monocular Depth Estimation and Lumen Segmentation
for Colonoscopy Endorobots [2.9696400288366127]
Colorectal cancer is the third most common cause of cancer death worldwide.
A vision-based autonomous endorobot can improve colonoscopy procedures significantly.
arXiv Detail & Related papers (2023-01-19T16:22:17Z) - Depth Estimation from Single-shot Monocular Endoscope Image Using Image
Domain Adaptation And Edge-Aware Depth Estimation [1.7086737326992167]
We propose a depth estimation method from a single-shot monocular endoscopic image using Lambertian surface translation by domain adaptation and depth estimation using multi-scale edge loss.
The texture and specular reflection on the surface of an organ reduce the accuracy of depth estimations.
We applied the estimated depth images to automated anatomical location identification of colonoscopic images using a convolutional neural network.
arXiv Detail & Related papers (2022-01-12T14:06:54Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation [111.89519571205778]
In this work, we propose an alternative domain-adaptive approach to depth estimation.
Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner.
The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin.
arXiv Detail & Related papers (2021-09-24T08:11:34Z) - Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained.
We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image.
Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.