DUE: Dynamic Uncertainty-Aware Explanation Supervision via 3D Imputation
- URL: http://arxiv.org/abs/2403.10831v1
- Date: Sat, 16 Mar 2024 06:49:32 GMT
- Title: DUE: Dynamic Uncertainty-Aware Explanation Supervision via 3D Imputation
- Authors: Qilong Zhao, Yifei Zhang, Mengdan Zhu, Siyi Gu, Yuyang Gao, Xiaofeng Yang, Liang Zhao,
- Abstract summary: We propose a Dynamic Uncertainty-aware Explanation supervision (DUE) framework for 3D explanation supervision.
Our proposed framework is validated through comprehensive experiments on diverse real-world medical imaging datasets.
- Score: 12.96084790953902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Explanation supervision aims to enhance deep learning models by integrating additional signals to guide the generation of model explanations, showcasing notable improvements in both the predictability and explainability of the model. However, the application of explanation supervision to higher-dimensional data, such as 3D medical images, remains an under-explored domain. Challenges associated with supervising visual explanations in the presence of an additional dimension include: 1) spatial correlation changed, 2) lack of direct 3D annotations, and 3) uncertainty varies across different parts of the explanation. To address these challenges, we propose a Dynamic Uncertainty-aware Explanation supervision (DUE) framework for 3D explanation supervision that ensures uncertainty-aware explanation guidance when dealing with sparsely annotated 3D data with diffusion-based 3D interpolation. Our proposed framework is validated through comprehensive experiments on diverse real-world medical imaging datasets. The results demonstrate the effectiveness of our framework in enhancing the predictability and explainability of deep learning models in the context of medical imaging diagnosis applications.
Related papers
- AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models [42.57469056850227]
Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic perception and control.<n>We propose a novel framework that integrates depth estimation into VLA models to enrich 3D feature representations.
arXiv Detail & Related papers (2026-02-11T09:57:32Z) - Med3D-R1: Incentivizing Clinical Reasoning in 3D Medical Vision-Language Models for Abnormality Diagnosis [20.302134776419955]
We propose a reinforcement learning framework with a two-stage training process: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL)<n>In RL stage, we redesign the consistency reward to explicitly promote coherent, step-by-step diagnostic reasoning.<n>Our model attains state-of-the-art accuracies of 41.92% on CT-RATE and 44.99% on RAD-ChestCT.
arXiv Detail & Related papers (2026-02-01T12:43:11Z) - Spatial-Aware Self-Supervision for Medical 3D Imaging with Multi-Granularity Observable Tasks [4.097364225798782]
We propose a method consisting of three sub-tasks to capture the spatially relevant semantics in medical 3D imaging.<n>Their design adheres to observable principles to ensure interpretability, and minimize the performance loss caused thereby as much as possible.
arXiv Detail & Related papers (2025-09-07T08:16:37Z) - P3Net: Progressive and Periodic Perturbation for Semi-Supervised Medical Image Segmentation [60.08541107831459]
We propose a progressive and periodic perturbation mechanism (P3M) and a boundary-focused loss to guide the learning of unlabeled data.<n>Our method achieves state-of-the-art performance on two 2D and 3D datasets.
arXiv Detail & Related papers (2025-05-21T05:35:28Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.
Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - 4D Contrastive Superflows are Dense 3D Representation Learners [62.433137130087445]
We introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing pretraining objectives.
To further boost learning efficiency, we incorporate a plug-and-play view consistency module that enhances alignment of the knowledge distilled from camera views.
arXiv Detail & Related papers (2024-07-08T17:59:54Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation [67.56268991234371]
OV-Uni3DETR achieves the state-of-the-art performance on various scenarios, surpassing existing methods by more than 6% on average.
Code and pre-trained models will be released later.
arXiv Detail & Related papers (2024-03-28T17:05:04Z) - An explainable three dimension framework to uncover learning patterns: A unified look in variable sulci recognition [2.960322639147262]
Three-dimensional (3D) global explanations are crucial in neuroimaging.
We develop a novel explainable artificial intelligence (XAI) 3D-Framework that provides robust, faithful, and low-complexity global explanations.
arXiv Detail & Related papers (2023-09-02T10:46:05Z) - Learning Scene Flow With Skeleton Guidance For 3D Action Recognition [1.5954459915735735]
This work demonstrates the use of 3D flow sequence by a deeptemporal model for 3D action recognition.
An extended deep skeleton is also introduced to learn the most discriminant action motion dynamics.
A late fusion scheme is adopted between the two models for learning the high level cross-modal correlations.
arXiv Detail & Related papers (2023-06-23T04:14:25Z) - 3D Vessel Segmentation with Limited Guidance of 2D Structure-agnostic
Vessel Annotations [3.6314292723682784]
Supervised deep learning has demonstrated its superior capacity in automatic 3D vessel segmentation.
The reliance on expensive 3D manual annotations and limited capacity for annotation reuse hinder the clinical applications of supervised models.
This paper proposes a novel 3D shape-guided local discrimination model for 3D vascular segmentation under limited guidance from public 2D vessel annotations.
arXiv Detail & Related papers (2023-02-07T07:26:00Z) - RES: A Robust Framework for Guiding Visual Explanation [8.835733039270364]
We propose a framework for guiding visual explanation by developing a novel objective that handles inaccurate boundary, incomplete region, and inconsistent distribution of human annotations.
Experiments on two real-world image datasets demonstrate the effectiveness of the proposed framework on enhancing both the reasonability of the explanation and the performance of the backbones model.
arXiv Detail & Related papers (2022-06-27T16:06:27Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Advancing 3D Medical Image Analysis with Variable Dimension Transform
based Supervised 3D Pre-training [45.90045513731704]
This paper revisits an innovative yet simple fully-supervised 3D network pre-training framework.
With a redesigned 3D network architecture, reformulated natural images are used to address the problem of data scarcity.
Comprehensive experiments on four benchmark datasets demonstrate that the proposed pre-trained models can effectively accelerate convergence.
arXiv Detail & Related papers (2022-01-05T03:11:21Z) - Unsupervised View-Invariant Human Posture Representation [28.840986167408037]
We present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image.
Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames.
We show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on RGB and depth images.
arXiv Detail & Related papers (2021-09-17T19:23:31Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.