MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
- URL: http://arxiv.org/abs/2504.01428v1
- Date: Wed, 02 Apr 2025 07:28:09 GMT
- Title: MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
- Authors: Zhuangzhuang Chen, Hualiang Wang, Chubin Ou, Xiaomeng Li,
- Abstract summary: We propose a multi-view Tri-alignment framework for OCT to OCTA 3D image translation in discrete and finite space, named MuTri.<n>We also collect the first large-scale dataset, namely, OCTA2024, which contains a pair of OCT and OCTA volumes from 846 subjects.
- Score: 8.48045976269756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical coherence tomography angiography (OCTA) shows its great importance in imaging microvascular networks by providing accurate 3D imaging of blood vessels, but it relies upon specialized sensors and expensive devices. For this reason, previous works show the potential to translate the readily available 3D Optical Coherence Tomography (OCT) images into 3D OCTA images. However, existing OCTA translation methods directly learn the mapping from the OCT domain to the OCTA domain in continuous and infinite space with guidance from only a single view, i.e., the OCTA project map, resulting in suboptimal results. To this end, we propose the multi-view Tri-alignment framework for OCT to OCTA 3D image translation in discrete and finite space, named MuTri. In the first stage, we pre-train two vector-quantized variational auto-encoder (VQ- VAE) by reconstructing 3D OCT and 3D OCTA data, providing semantic prior for subsequent multi-view guidances. In the second stage, our multi-view tri-alignment facilitates another VQVAE model to learn the mapping from the OCT domain to the OCTA domain in discrete and finite space. Specifically, a contrastive-inspired semantic alignment is proposed to maximize the mutual information with the pre-trained models from OCT and OCTA views, to facilitate codebook learning. Meanwhile, a vessel structure alignment is proposed to minimize the structure discrepancy with the pre-trained models from the OCTA project map view, benefiting from learning the detailed vessel structure information. We also collect the first large-scale dataset, namely, OCTA2024, which contains a pair of OCT and OCTA volumes from 846 subjects.
Related papers
- DVG-Diffusion: Dual-View Guided Diffusion Model for CT Reconstruction from X-Rays [32.55527512602604]
We facilitate complex 2D X-ray image to 3D CT mapping by incorporating new view synthesis, and reduce the learning difficulty through view-guided feature alignment.<n>Specifically, we propose a dual-view guided diffusion model (DVG-Diffusion), which couples a real input X-ray view and a synthesized new X-ray view to jointly guide CT reconstruction.
arXiv Detail & Related papers (2025-03-22T16:03:18Z) - SS-CTML: Self-Supervised Cross-Task Mutual Learning for CT Image Reconstruction [14.197733437855169]
Supervised deep-learning (SDL) techniques with paired training datasets have been widely studied for X-ray computed tomography (CT) image reconstruction.<n>We propose a self-supervised cross-task mutual learning (SS-CTML) framework for CT image reconstruction.
arXiv Detail & Related papers (2024-12-31T04:32:46Z) - OCTCube-M: A 3D multimodal optical coherence tomography foundation model for retinal and systemic diseases with cross-cohort and cross-device validation [20.17448729403084]
We present OCTCube-M, a 3D OCT-based multi-modal foundation model for jointly analyzing OCT and en face images.<n> OCTCube-M first developed OCTCube, a 3D foundation model pre-trained on 26,685 3D OCT volumes encompassing 1.62 million 2D OCT images.<n>It exploits a novel multi-modal contrastive learning framework COEP to integrate other retinal imaging modalities, such as fundus autofluorescence and infrared retinal imaging.
arXiv Detail & Related papers (2024-08-20T22:55:19Z) - CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging [78.734927709231]
Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements.
Due to ill-posedness, implicit neural representation (INR) techniques may leave considerable holes'' (i.e., unmodeled spaces) in their fields, leading to sub-optimal results.
We propose the Coordinate-based Continuous Projection Field (CoCPF), which aims to build hole-free representation fields for SVCT reconstruction.
arXiv Detail & Related papers (2024-06-21T08:38:30Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios [53.94122089629544]
We introduce CT-GLIP (Grounded Language-Image Pretraining with CT scans), a novel method that constructs organ-level image-text pairs to enhance multimodal contrastive learning.
Our method, trained on a multimodal CT dataset comprising 44,011 organ-level vision-text pairs from 17,702 patients across 104 organs, demonstrates it can identify organs and abnormalities in a zero-shot manner using natural languages.
arXiv Detail & Related papers (2024-04-23T17:59:01Z) - Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography [10.110878689623961]
We introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports.
We develop CT-CLIP, a CT-focused contrastive language-image pretraining framework.
We create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes.
arXiv Detail & Related papers (2024-03-26T16:19:56Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Multi-View Vertebra Localization and Identification from CT Images [57.56509107412658]
We propose a multi-view vertebra localization and identification from CT images.
We convert the 3D problem into a 2D localization and identification task on different views.
Our method can learn the multi-view global information naturally.
arXiv Detail & Related papers (2023-07-24T14:43:07Z) - Vessel-Promoted OCT to OCTA Image Translation by Heuristic Contextual Constraints [28.715207556565638]
We introduce a novel method called TransPro to translate readily available 3D Optical Coherence Tomography images into 3D OCTA images.
Our TransPro method is primarily driven by two novel ideas that have been overlooked by prior work.
Experimental results on two datasets demonstrate that our TransPro outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-13T01:42:29Z) - Weakly Supervised Volumetric Image Segmentation with Deformed Templates [80.04326168716493]
We propose an approach that is truly weakly-supervised in the sense that we only need to provide a sparse set of 3D point on the surface of target objects.
We will show that it outperforms a more traditional approach to weak-supervision in 3D at a reduced supervision cost.
arXiv Detail & Related papers (2021-06-07T22:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.