ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors
- URL: http://arxiv.org/abs/2509.13525v1
- Date: Tue, 16 Sep 2025 20:40:22 GMT
- Title: ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors
- Authors: Romain Hardy, Tyler Berzin, Pranav Rajpurkar,
- Abstract summary: ColonCrafter is a diffusion-based depth estimation model that generates temporally consistent depth maps from monocular colonoscopy videos.<n>Our approach learns robust geometric priors from synthetic colonoscopy sequences to generate temporally consistent depth maps.
- Score: 1.9437590375121516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Three-dimensional (3D) scene understanding in colonoscopy presents significant challenges that necessitate automated methods for accurate depth estimation. However, existing depth estimation models for endoscopy struggle with temporal consistency across video sequences, limiting their applicability for 3D reconstruction. We present ColonCrafter, a diffusion-based depth estimation model that generates temporally consistent depth maps from monocular colonoscopy videos. Our approach learns robust geometric priors from synthetic colonoscopy sequences to generate temporally consistent depth maps. We also introduce a style transfer technique that preserves geometric structure while adapting real clinical videos to match our synthetic training domain. ColonCrafter achieves state-of-the-art zero-shot performance on the C3VD dataset, outperforming both general-purpose and endoscopy-specific approaches. Although full trajectory 3D reconstruction remains a challenge, we demonstrate clinically relevant applications of ColonCrafter, including 3D point cloud generation and surface coverage assessment.
Related papers
- InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting [64.42884719282323]
InpaintHuman is a novel method for generating high-fidelity, complete, and animatable avatars from occluded monocular videos.<n>Our approach employs direct pixel-level supervision to ensure identity fidelity.
arXiv Detail & Related papers (2026-01-05T13:26:02Z) - ColonAdapter: Geometry Estimation Through Foundation Model Adaptation for Colonoscopy [18.844097623387974]
Estimating 3D geometry from monocular colonoscopy images is challenging due to non-Lambertian surfaces, moving light sources, and large textureless regions.<n>We present ColonAdapter, a self-supervised fine-tuning framework that adapts geometric foundation models for colonoscopy geometry estimation.
arXiv Detail & Related papers (2025-11-27T09:21:11Z) - G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior [53.762256749551284]
We identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction.<n>We incorporate this geometry guidance throughout the generative pipeline to improve visibility mask estimation, guide novel view selection, and enhance multi-view consistency when inpainting with video diffusion models.<n>Our method naturally supports single-view inputs and unposed videos, with strong generalizability in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2025-10-14T03:06:28Z) - C3VDv2 -- Colonoscopy 3D video dataset with enhanced realism [1.1531098736643364]
This paper introduces C3VDv2, the second version (v2) of the high-definition Colonoscopy 3D Video dataset.<n>192 video sequences totaling 169,371 frames were captured by imaging 60 unique, high-fidelity silicone colon phantom segments.<n>Eight simulated screening colonoscopy videos acquired by a gastroenterologist are provided with ground truth poses.<n>The dataset includes 15 videos with colon deformations for qualitative assessment.
arXiv Detail & Related papers (2025-06-30T17:29:06Z) - Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras [41.985581990753765]
We introduce Endo3DAC, a unified framework for endoscopic scene reconstruction.<n>We design an integrated network capable of simultaneously estimating depth maps, relative poses, and camera intrinsic parameters.<n>Experiments across four endoscopic datasets demonstrate that Endo3DAC significantly outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2025-03-20T07:49:04Z) - Align3R: Aligned Monocular Depth Estimation for Dynamic Videos [50.28715151619659]
We propose a novel video-depth estimation method called Align3R to estimate temporal consistent depth maps for a dynamic video.<n>Our key idea is to utilize the recent DUSt3R model to align estimated monocular depth maps of different timesteps.<n>Experiments demonstrate that Align3R estimates consistent video depth and camera poses for a monocular video with superior performance than baseline methods.
arXiv Detail & Related papers (2024-12-04T07:09:59Z) - ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation [67.22294293695255]
We propose a novel reconstruction pipeline with a bi-directional adaptation architecture named ToDER to get precise depth estimations.
Experimental results demonstrate that our approach can precisely predict depth maps in both realistic and synthetic colonoscopy videos.
arXiv Detail & Related papers (2024-07-23T14:24:26Z) - Endora: Video Generation Models as Endoscopy Simulators [53.72175969751398]
This paper introduces model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes.
We also pioneer the first public benchmark for endoscopy simulation with video generation models.
Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research.
arXiv Detail & Related papers (2024-03-17T00:51:59Z) - DreaMo: Articulated 3D Reconstruction From A Single Casual Video [59.87221439498147]
We study articulated 3D shape reconstruction from a single and casually captured internet video, where the subject's view coverage is incomplete.
DreaMo shows promising quality in novel-view rendering, detailed articulated shape reconstruction, and skeleton generation.
arXiv Detail & Related papers (2023-12-05T09:47:37Z) - Multi-task learning with cross-task consistency for improved depth
estimation in colonoscopy [0.2995885872626565]
We develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator.
We demonstrate an improvement of 14.17% on relative error and 10.4% on $delta_1$ accuracy over the most accurate baseline state-of-the-art BTS approach.
arXiv Detail & Related papers (2023-11-30T16:13:17Z) - ColDE: A Depth Estimation Framework for Colonoscopy Reconstruction [27.793186578742088]
In this work we have designed a set of training losses to deal with the special challenges of colonoscopy data.
With the training losses powerful enough, our self-supervised framework named ColDE is able to produce better depth maps of colonoscopy data.
arXiv Detail & Related papers (2021-11-19T04:44:27Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.