Hierarchical Neural Semantic Representation for 3D Semantic Correspondence
- URL: http://arxiv.org/abs/2509.17431v2
- Date: Tue, 23 Sep 2025 05:56:37 GMT
- Title: Hierarchical Neural Semantic Representation for 3D Semantic Correspondence
- Authors: Keyu Du, Jingyu Hu, Haipeng Li, Hao Xu, Haibing Huang, Chi-Wing Fu, Shuaicheng Liu,
- Abstract summary: We design the hierarchical neural semantic representation (HNSR), which consists of a global semantic feature to capture high-level structure and multi-resolution local geometric features.<n>Second, we design a progressive global-to-local matching strategy, which establishes coarse semantic correspondence using the global semantic feature.<n>Third, our framework is training-free and broadly compatible with various pre-trained 3D generative backbones, demonstrating strong generalization across diverse shape categories.
- Score: 72.8101601086805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a new approach to estimate accurate and robust 3D semantic correspondence with the hierarchical neural semantic representation. Our work has three key contributions. First, we design the hierarchical neural semantic representation (HNSR), which consists of a global semantic feature to capture high-level structure and multi-resolution local geometric features to preserve fine details, by carefully harnessing 3D priors from pre-trained 3D generative models. Second, we design a progressive global-to-local matching strategy, which establishes coarse semantic correspondence using the global semantic feature, then iteratively refines it with local geometric features, yielding accurate and semantically-consistent mappings. Third, our framework is training-free and broadly compatible with various pre-trained 3D generative backbones, demonstrating strong generalization across diverse shape categories. Our method also supports various applications, such as shape co-segmentation, keypoint matching, and texture transfer, and generalizes well to structurally diverse shapes, with promising results even in cross-category scenarios. Both qualitative and quantitative evaluations show that our method outperforms previous state-of-the-art techniques.
Related papers
- Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors [0.0]
We introduce SemGeo-AttentionNet, a dual-stream architecture that formalizes the interplay between geometric processing and semantic recognition.<n>We extend our framework to temporal scanpath generation through reinforcement learning.<n> Evaluation on SAL3D, NUS3D and 3DVA datasets demonstrates substantial improvements.
arXiv Detail & Related papers (2026-02-06T06:15:38Z) - SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation [114.57192386025373]
SegSplat is a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding.<n>This work represents a significant step towards practical, on-the-fly generation of semantically aware 3D environments.
arXiv Detail & Related papers (2025-11-23T10:26:38Z) - CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation [16.85102888388904]
CUS-GS is a compact unified structured Gaussian Splatting representation.<n>We propose a feature-aware significance evaluation strategy to guide anchor growing and pruning.<n>CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters.
arXiv Detail & Related papers (2025-11-22T03:42:49Z) - Unlocking 3D Affordance Segmentation with 2D Semantic Knowledge [45.19482892758984]
Affordance segmentation aims to parse 3D objects into functionally distinct parts, bridging recognition and interaction for applications in robotic manipulation, embodied AI, and AR.<n>We introduce Cross-Modal Affinity Transfer (CMAT), a pre-training strategy that aligns a 3D encoder with lifted 2D semantics and jointly optimize reconstruction, affinity, and diversity to yield semantically organized representations.<n>We further design the Cross-modal Affordance Transformer (CAST), which integrates multi-modal prompts with CMAT-pretrained features to generate precise, prompt-aware segmentation maps.
arXiv Detail & Related papers (2025-10-09T15:01:26Z) - HierOctFusion: Multi-scale Octree-based 3D Shape Generation via Part-Whole-Hierarchy Message Passing [9.953394373473621]
3D content generation remains a fundamental yet challenging task due to the inherent structural complexity of 3D data.<n>We propose HierOctFusion, a part-aware multi-scale octree diffusion model that enhances hierarchical feature interaction for generating fine-grained and sparse object structures.<n> Experiments demonstrate that HierOctFusion achieves superior shape quality and efficiency compared to prior methods.
arXiv Detail & Related papers (2025-08-14T23:12:18Z) - End-to-End Fine-Tuning of 3D Texture Generation using Differentiable Rewards [8.953379216683732]
We propose an end-to-end differentiable, reinforcement-learning-free framework that embeds human feedback, expressed as differentiable reward functions, directly into the 3D texture pipeline.<n>By back-propagating preference signals through both geometric and appearance modules, our method generates textures that respect the 3D geometry structure and align with desired criteria.
arXiv Detail & Related papers (2025-06-23T06:24:12Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation [122.47961178994456]
SeMv-3D is a novel framework that jointly enhances semantic alignment and multi-view consistency in GT23D generation.<n>At its core, we introduce Triplane Prior Learning (TPL), which effectively learns triplane priors.<n>We also present Prior-based Semantic Aligning in Triplanes (SAT), which enables consistent any-view synthesis.
arXiv Detail & Related papers (2024-10-10T07:02:06Z) - Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception [41.77153804695413]
An effective pre-training framework with universal 3D representations is extremely desired in perceiving large-scale dynamic scenes.
We propose a CSC framework that puts a scene-level semantic consistency in the heart, bridging the connection of the similar semantic segments across various scenes.
arXiv Detail & Related papers (2024-05-12T07:58:52Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Zero-Shot 3D Shape Correspondence [67.18775201037732]
We propose a novel zero-shot approach to computing correspondences between 3D shapes.
We exploit the exceptional reasoning capabilities of recent foundation models in language and vision.
Our approach produces highly plausible results in a zero-shot manner, especially between strongly non-isometric shapes.
arXiv Detail & Related papers (2023-06-05T21:14:23Z) - Graph Stacked Hourglass Networks for 3D Human Pose Estimation [1.0660480034605242]
We propose a novel graph convolutional network architecture, Graph Stacked Hourglass Networks, for 2D-to-3D human pose estimation tasks.
The proposed architecture consists of repeated encoder-decoder, in which graph-structured features are processed across three different scales of human skeletal representations.
arXiv Detail & Related papers (2021-03-30T14:25:43Z) - Primal-Dual Mesh Convolutional Neural Networks [62.165239866312334]
We propose a primal-dual framework drawn from the graph-neural-network literature to triangle meshes.
Our method takes features for both edges and faces of a 3D mesh as input and dynamically aggregates them.
We provide theoretical insights of our approach using tools from the mesh-simplification literature.
arXiv Detail & Related papers (2020-10-23T14:49:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.