Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes
- URL: http://arxiv.org/abs/2503.13429v1
- Date: Mon, 17 Mar 2025 17:55:15 GMT
- Title: Escaping Plato's Cave: Robust Conceptual Reasoning through Interpretable 3D Neural Object Volumes
- Authors: Nhi Pham, Bernt Schiele, Adam Kortylewski, Jonas Fischer,
- Abstract summary: We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies interpretability and robustness in image classification.<n>We design an inherently-interpretable and robust classifier by extending existing 3D-aware classifiers with concepts extracted from their volumetric representations for classification.<n>In an array of quantitative metrics for interpretability, we compare against different concept-based approaches across the explainable AI literature and show that CAVE discovers well-grounded concepts that are used consistently across images, while achieving superior robustness.
- Score: 65.63534641857476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rise of neural networks, especially in high-stakes applications, these networks need two properties (i) robustness and (ii) interpretability to ensure their safety. Recent advances in classifiers with 3D volumetric object representations have demonstrated a greatly enhanced robustness in out-of-distribution data. However, these 3D-aware classifiers have not been studied from the perspective of interpretability. We introduce CAVE - Concept Aware Volumes for Explanations - a new direction that unifies interpretability and robustness in image classification. We design an inherently-interpretable and robust classifier by extending existing 3D-aware classifiers with concepts extracted from their volumetric representations for classification. In an array of quantitative metrics for interpretability, we compare against different concept-based approaches across the explainable AI literature and show that CAVE discovers well-grounded concepts that are used consistently across images, while achieving superior robustness.
Related papers
- Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning [46.85417907244265]
We propose an interpretable single-view 3DGS framework, termed 3DisGS, to discover both coarse- and fine-grained 3D semantics.
Our model achieves 3D disentanglement while preserving high-quality and rapid reconstruction.
arXiv Detail & Related papers (2025-04-05T14:42:13Z) - Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D.
Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z) - ASCENT-ViT: Attention-based Scale-aware Concept Learning Framework for Enhanced Alignment in Vision Transformers [29.932706137805713]
ASCENT-ViT is an attention-based, concept learning framework for Vision Transformers (ViTs)<n>It composes scale and position-aware representations from multiscale feature pyramids and ViT patch representations, respectively.<n>It can be utilized as a classification head on top of standard ViT backbones for improved predictive performance and accurate and robust concept explanations.
arXiv Detail & Related papers (2025-01-16T00:45:05Z) - GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency [50.11520458252128]
Existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data.<n>We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models.<n>GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data.
arXiv Detail & Related papers (2024-12-12T17:59:03Z) - Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
Supervised 3D Visual Grounding [58.924180772480504]
3D visual grounding involves finding a target object in a 3D scene that corresponds to a given sentence query.
We propose to leverage weakly supervised annotations to learn the 3D visual grounding model.
We design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.
arXiv Detail & Related papers (2023-07-18T13:49:49Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - Renderers are Good Zero-Shot Representation Learners: Exploring
Diffusion Latents for Metric Learning [1.0152838128195467]
We use retrieval as a proxy for measuring the metric learning properties of the latent spaces of Shap-E.
We find that Shap-E representations outperform those of the classical EfficientNet baseline representations zero-shot.
These findings give preliminary indication that 3D-based rendering and generative models can yield useful representations for discriminative tasks in our innately 3D-native world.
arXiv Detail & Related papers (2023-06-19T06:41:44Z) - NOVUM: Neural Object Volumes for Robust Object Classification [22.411611823528272]
We show that explicitly integrating 3D compositional object representations into deep networks for image classification leads to a largely enhanced generalization in out-of-distribution scenarios.
In particular, we introduce a novel architecture, referred to as NOVUM, that consists of a feature extractor and a neural object volume for every target object class.
Our experiments show that NOVUM offers intriguing advantages over standard architectures due to the 3D compositional structure of the object representation.
arXiv Detail & Related papers (2023-05-24T03:20:09Z) - Understanding Robust Learning through the Lens of Representation
Similarities [37.66877172364004]
robustness to adversarial examples has emerged as a desirable property for deep neural networks (DNNs)
In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training.
arXiv Detail & Related papers (2022-06-20T16:06:20Z) - Spatial-temporal Concept based Explanation of 3D ConvNets [5.461115214431218]
We present a 3D ACE (Automatic Concept-based Explanation) framework for interpreting 3D ConvNets.
In our approach, videos are represented using high-level supervoxels, which is straightforward for human to understand.
Experiments show that our method can discover spatial-temporal concepts of different importance-levels.
arXiv Detail & Related papers (2022-06-09T08:04:46Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.