Multi-modal 3D Pose and Shape Estimation with Computed Tomography
- URL: http://arxiv.org/abs/2503.19405v1
- Date: Tue, 25 Mar 2025 07:24:58 GMT
- Title: Multi-modal 3D Pose and Shape Estimation with Computed Tomography
- Authors: Mingxiao Tu, Hoijoon Jung, Alireza Moghadam, Jineel Raythatha, Lachlan Allan, Jeremy Hsu, Andre Kyme, Jinman Kim,
- Abstract summary: We present the first multi-modal in-bed patient 3D pose and shape estimation network that fuses detailed geometric features extracted from CT scans with depth maps.<n>mPSE-CT robustly reconstructs occluded body regions and enhances the accuracy of the estimated 3D human mesh model.
- Score: 6.614339747258751
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In perioperative care, precise in-bed 3D patient pose and shape estimation (PSE) can be vital in optimizing patient positioning in preoperative planning, enabling accurate overlay of medical images for augmented reality-based surgical navigation, and mitigating risks of prolonged immobility during recovery. Conventional PSE methods relying on modalities such as RGB-D, infrared, or pressure maps often struggle with occlusions caused by bedding and complex patient positioning, leading to inaccurate estimation that can affect clinical outcomes. To address these challenges, we present the first multi-modal in-bed patient 3D PSE network that fuses detailed geometric features extracted from routinely acquired computed tomography (CT) scans with depth maps (mPSE-CT). mPSE-CT incorporates a shape estimation module that utilizes probabilistic correspondence alignment, a pose estimation module with a refined neural network, and a final parameters mixing module. This multi-modal network robustly reconstructs occluded body regions and enhances the accuracy of the estimated 3D human mesh model. We validated mPSE-CT using proprietary whole-body rigid phantom and volunteer datasets in clinical scenarios. mPSE-CT outperformed the best-performing prior method by 23% and 49.16% in pose and shape estimation respectively, demonstrating its potential for improving clinical outcomes in challenging perioperative environments.
Related papers
- Abnormality-Driven Representation Learning for Radiology Imaging [0.8321462983924758]
We introduce lesion-enhanced contrastive learning (LeCL), a novel approach to obtain visual representations driven by abnormalities in 2D axial slices across different locations of the CT scans.
We evaluate our approach across three clinical tasks: tumor lesion location, lung disease detection, and patient staging, benchmarking against four state-of-the-art foundation models.
arXiv Detail & Related papers (2024-11-25T13:53:26Z) - 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion [32.71972792352939]
3D patient body modeling is critical to the success of automated patient positioning for smart medical scanning and operating rooms.
Existing CNN-based end-to-end patient modeling solutions typically require customized network designs demanding large amount of relevant training data.
We propose a generic modularized 3D patient modeling method consists of (a) a multi-modal keypoint detection module with attentive fusion for 2D patient joint localization.
We demonstrate the efficacy of the proposed method by extensive patient positioning experiments on both public and clinical data.
arXiv Detail & Related papers (2024-03-05T18:58:55Z) - On the Localization of Ultrasound Image Slices within Point Distribution
Models [84.27083443424408]
Thyroid disorders are most commonly diagnosed using high-resolution Ultrasound (US)
Longitudinal tracking is a pivotal diagnostic protocol for monitoring changes in pathological thyroid morphology.
We present a framework for automated US image slice localization within a 3D shape representation.
arXiv Detail & Related papers (2023-09-01T10:10:46Z) - Comprehensive Validation of Automated Whole Body Skeletal Muscle,
Adipose Tissue, and Bone Segmentation from 3D CT images for Body Composition
Analysis: Towards Extended Body Composition [0.6176955945418618]
Powerful tools of artificial intelligence such as deep learning are making it feasible now to segment the entire 3D image and generate accurate measurements of all internal anatomy.
These will enable the overcoming of the severe bottleneck that existed previously, namely, the need for manual segmentation.
These measurements were hitherto unavailable thereby limiting the field to a very small and limited subset.
arXiv Detail & Related papers (2021-06-01T17:30:45Z) - Deep Implicit Statistical Shape Models for 3D Medical Image Delineation [47.78425002879612]
3D delineation of anatomical structures is a cardinal goal in medical imaging analysis.
Prior to deep learning, statistical shape models that imposed anatomical constraints and produced high quality surfaces were a core technology.
We present deep implicit statistical shape models (DISSMs), a new approach to delineation that marries the representation power of CNNs with the robustness of SSMs.
arXiv Detail & Related papers (2021-04-07T01:15:06Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - Tattoo tomography: Freehand 3D photoacoustic image reconstruction with
an optical pattern [49.240017254888336]
Photoacoustic tomography (PAT) is a novel imaging technique that can resolve both morphological and functional tissue properties.
A current drawback is the limited field-of-view provided by the conventionally applied 2D probes.
We present a novel approach to 3D reconstruction of PAT data that does not require an external tracking system.
arXiv Detail & Related papers (2020-11-10T09:27:56Z) - Probabilistic 3D surface reconstruction from sparse MRI information [58.14653650521129]
We present a novel probabilistic deep learning approach for concurrent 3D surface reconstruction from sparse 2D MR image data and aleatoric uncertainty prediction.
Our method is capable of reconstructing large surface meshes from three quasi-orthogonal MR imaging slices from limited training sets.
arXiv Detail & Related papers (2020-10-05T14:18:52Z) - Deep Volumetric Universal Lesion Detection using Light-Weight Pseudo 3D
Convolution and Surface Point Regression [23.916776570010285]
Computer-aided lesion/significant-findings detection techniques are at the core of medical imaging.
We propose a novel deep anchor-free one-stage VULD framework that incorporates (1) P3DC operators to recycle the architectural configurations and pre-trained weights from the off-the-shelf 2D networks.
New SPR method to effectively regress the 3D lesion spatial extents by pinpointing their representative key points on lesion surfaces.
arXiv Detail & Related papers (2020-08-30T19:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.