ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs
- URL: http://arxiv.org/abs/2506.13195v1
- Date: Mon, 16 Jun 2025 08:01:14 GMT
- Title: ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs
- Authors: Bikram Keshari Parida, Anusree P. Sunilkumar, Abhijit Sen, Wonsang You,
- Abstract summary: Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) and Cone-Beam Computed Tomography (CBCT)<n>While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility.<n>We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) providing 2D oral cavity representations, and Cone-Beam Computed Tomography (CBCT) offering detailed 3D anatomical information. While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility. Existing reconstruction models further complicate the process by requiring CBCT flattening or prior dental arch information, often unavailable clinically. We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX. Our key innovations include: (1) enhancing the NeBLa framework with Vision Transformers for improved reconstruction capabilities without requiring CBCT flattening or prior dental arch information, (2) implementing a novel horseshoe-shaped point sampling strategy with non-intersecting rays that eliminates intermediate density aggregation required by existing models due to intersecting rays, reducing sampling point computations by $52 \%$, (3) replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior global and local feature extraction, and (4) implementing learnable hash positional encoding for better higher-dimensional representation of 3D sample points compared to existing Fourier-based dense positional encoding. Experiments demonstrate that ViT-NeBLa significantly outperforms prior state-of-the-art methods both quantitatively and qualitatively, offering a cost-effective, radiation-efficient alternative for enhanced dental diagnostics.
Related papers
- DeepSparse: A Foundation Model for Sparse-View CBCT Reconstruction [9.579390210009521]
Sparse-view reconstruction reduces radiation by using fewer X-ray projections while maintaining image quality.<n>Existing methods face challenges such as high computational demands and poor generalizability to different datasets.<n>We propose DeepSparse, the first foundation model for sparse-view CBCT reconstruction, featuring DiCE, a novel network that integrates multi-view 2D features and multi-scale 3D features.
arXiv Detail & Related papers (2025-05-05T13:14:49Z) - PX2Tooth: Reconstructing the 3D Point Cloud Teeth from a Single Panoramic X-ray [20.913080797758816]
We propose PX2Tooth, a novel approach to reconstruct 3D teeth using a single PX image with a two-stage framework.
First, we design the PXSegNet to segment the permanent teeth from the PX images, providing clear positional, morphological, and categorical information for each tooth.
Subsequently, we design a novel tooth generation network (TGNet) that learns to transform random point clouds into 3D teeth.
arXiv Detail & Related papers (2024-11-06T07:44:04Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - SdCT-GAN: Reconstructing CT from Biplanar X-Rays with Self-driven
Generative Adversarial Networks [6.624839896733912]
This paper presents a new self-driven generative adversarial network model (SdCT-GAN) for reconstruction of 3D CT images.
It is motivated to pay more attention to image details by introducing a novel auto-encoder structure in the discriminator.
LPIPS evaluation metric is adopted that can quantitatively evaluate the fine contours and textures of reconstructed images better than the existing ones.
arXiv Detail & Related papers (2023-09-10T08:16:02Z) - Geometry-Aware Attenuation Learning for Sparse-View CBCT Reconstruction [53.93674177236367]
Cone Beam Computed Tomography (CBCT) plays a vital role in clinical imaging.
Traditional methods typically require hundreds of 2D X-ray projections to reconstruct a high-quality 3D CBCT image.
This has led to a growing interest in sparse-view CBCT reconstruction to reduce radiation doses.
We introduce a novel geometry-aware encoder-decoder framework to solve this problem.
arXiv Detail & Related papers (2023-03-26T14:38:42Z) - Oral-3Dv2: 3D Oral Reconstruction from Panoramic X-Ray Imaging with
Implicit Neural Representation [3.8215162658168524]
Oral-3Dv2 is a non-adversarial-learning-based model in 3D radiology reconstruction from a single panoramic X-ray image.
Our model learns to represent the 3D oral structure in an implicit way by mapping 2D coordinates into density values of voxels in the 3D space.
To the best of our knowledge, this is the first work of a non-adversarial-learning-based model in 3D radiology reconstruction from a single panoramic X-ray image.
arXiv Detail & Related papers (2023-03-21T18:17:27Z) - Metal Artifact Reduction with Intra-Oral Scan Data for 3D Low Dose
Maxillofacial CBCT Modeling [0.7444835592104696]
A two-stage metal artifact reduction method is proposed for accurate 3D low-dose maxillofacial CBCT modeling.
In the first stage, an image-to-image deep learning network is employed to mitigate metal-related artifacts.
In the second stage, a 3D maxillofacial model is constructed by segmenting the bones from the dental CBCT image corrected.
arXiv Detail & Related papers (2022-02-08T00:24:41Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - A Learning-based Method for Online Adjustment of C-arm Cone-Beam CT
Source Trajectories for Artifact Avoidance [47.345403652324514]
The reconstruction quality attainable with commercial CBCT devices is insufficient due to metal artifacts in the presence of pedicle screws.
We propose to adjust the C-arm CBCT source trajectory during the scan to optimize reconstruction quality with respect to a certain task.
We demonstrate that convolutional neural networks trained on realistically simulated data are capable of predicting quality metrics that enable scene-specific adjustments of the CBCT source trajectory.
arXiv Detail & Related papers (2020-08-14T09:23:50Z) - Deep Q-Network-Driven Catheter Segmentation in 3D US by Hybrid
Constrained Semi-Supervised Learning and Dual-UNet [74.22397862400177]
We propose a novel catheter segmentation approach, which requests fewer annotations than the supervised learning method.
Our scheme considers a deep Q learning as the pre-localization step, which avoids voxel-level annotation.
With the detected catheter, patch-based Dual-UNet is applied to segment the catheter in 3D volumetric data.
arXiv Detail & Related papers (2020-06-25T21:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.