Related papers: Features Emerge as Discrete States: The First Application of SAEs to 3D Representations

Features Emerge as Discrete States: The First Application of SAEs to 3D Representations

URL: http://arxiv.org/abs/2512.11263v2
Date: Mon, 15 Dec 2025 23:25:31 GMT
Title: Features Emerge as Discrete States: The First Application of SAEs to 3D Representations
Authors: Albert Miao, Chenliang Zhou, Jiawei Zhou, Cengiz Oztireli,
Abstract summary: Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations.<n>We present the first application of SAEs to the 3D domain, analyzing the features used by a state-of-the-art 3D reconstruction VAE applied to 53k 3D models.
Score: 5.751184796461698
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations, translating the hidden state into human ideas with high semantic value despite no external intervention or guidance. However, this technique has rarely been applied outside of the textual domain, limiting theoretical explorations of feature decomposition. We present the first application of SAEs to the 3D domain, analyzing the features used by a state-of-the-art 3D reconstruction VAE applied to 53k 3D models from the Objaverse dataset. We observe that the network encodes discrete rather than continuous features, leading to our key finding: such models approximate a discrete state space, driven by phase-like transitions from feature activations. Through this state transition framework, we address three otherwise unintuitive behaviors - the inclination of the reconstruction model towards positional encoding representations, the sigmoidal behavior of reconstruction loss from feature ablation, and the bimodality in the distribution of phase transition points. This final observation suggests the model redistributes the interference caused by superposition to prioritize the saliency of different features. Our work not only compiles and explains unexpected phenomena regarding feature decomposition, but also provides a framework to explain the model's feature learning dynamics. The code and dataset of encoded 3D objects will be available on release.

Related papers

Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding [86.55824709875598]
We propose a joint enhancement framework for 3D semantic Gaussian modeling that synergizes both semantic and rendering branches.<n>Unlike conventional point cloud shape encoding, we introduce an anisotropic 3D Gaussian Chebyshev descriptor to capture fine-grained 3D shape details.<n>We employ a cross-scene knowledge transfer module to continuously update learned shape patterns, enabling faster convergence and robust representations.
arXiv Detail & Related papers (2026-01-05T18:33:50Z)
PRGCN: A Graph Memory Network for Cross-Sequence Pattern Reuse in 3D Human Pose Estimation [18.771349697842947]
This work introduces the Pattern Reuse Graph Conal Network (PRGCN), a novel framework that formalizes pose estimation as a problem of pattern retrieval and adaptation.<n>At its core, PRGCN features a graph memory bank that learns and stores a compact set of pose prototypes, encoded as relational graphs, which are dynamically retrieved via an attention mechanism to provide structured priors.<n>Our work posits that PRGCN establishes a new state-of-the-art, achieving an MPJPE of 37.1mm and 13.4mm, respectively, while exhibiting enhanced cross-domain generalization capability.
arXiv Detail & Related papers (2025-10-22T11:12:07Z)
Semantic Causality-Aware Vision-Based 3D Occupancy Prediction [63.752869043357585]
Vision-based 3D semantic occupancy prediction is a critical task in 3D vision.<n>Existing methods, however, often rely on modular pipelines.<n>We propose a novel causal loss that enables holistic, end-to-end supervision of the modular 2D-to-3D transformation pipeline.
arXiv Detail & Related papers (2025-09-10T08:29:22Z)
Sparse-View 3D Reconstruction: Recent Advances and Open Challenges [0.8583178253811411]
Sparse-view 3D reconstruction is essential for applications in which dense image acquisition is impractical.<n>This survey reviews the latest advances in neural implicit models and explicit point-cloud-based approaches.<n>We analyze how geometric regularization, explicit shape modeling, and generative inference are used to mitigate artifacts.
arXiv Detail & Related papers (2025-07-22T09:57:28Z)
Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition [9.411542547451193]
We propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning.<n>Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density.<n>With the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view and 3D segment perspectives.
arXiv Detail & Related papers (2025-06-17T07:04:07Z)
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D.<n>Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns. A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z)
Consistency of Implicit and Explicit Features Matters for Monocular 3D Object Detection [4.189643331553922]
Monocular 3D object detection is a common solution for low-cost autonomous agents to perceive their surroundings. We present CIEF, with the first orientation-aware image backbone to eliminate the disparity of implicit and explicit features in subsequent 3D representation. CIEF ranked 1st among all reported methods on both 3D and BEV detection benchmark of KITTI at submission time.
arXiv Detail & Related papers (2022-07-16T13:00:32Z)
Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics. Recent neural implicit modeling methods show promising results on synthetic or dense datasets. But, they perform poorly on real-world data that is sparse and noisy. This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z)
Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes. By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space. We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)
Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data. IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.