Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision
- URL: http://arxiv.org/abs/2511.16650v1
- Date: Thu, 20 Nov 2025 18:54:31 GMT
- Title: Late-decoupled 3D Hierarchical Semantic Segmentation with Semantic Prototype Discrimination based Bi-branch Supervision
- Authors: Shuyu Cao, Chongshou Li, Jie Xu, Tianrui Li, Na Zhao,
- Abstract summary: 3D hierarchical semantic segmentation is crucial for embodied intelligence applications.<n>Previous methods have overlooked two challenges: I) multi-label learning with a parameter-sharing model can lead to multi-hierarchy conflicts in cross-hierarchy optimization.<n>We propose a novel framework with a primary 3DHS branch and an auxiliary discrimination branch.
- Score: 12.549493479238015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D hierarchical semantic segmentation (3DHS) is crucial for embodied intelligence applications that demand a multi-grained and multi-hierarchy understanding of 3D scenes. Despite the progress, previous 3DHS methods have overlooked following two challenges: I) multi-label learning with a parameter-sharing model can lead to multi-hierarchy conflicts in cross-hierarchy optimization, and II) the class imbalance issue is inevitable across multiple hierarchies of 3D scenes, which makes the model performance become dominated by major classes. To address these issues, we propose a novel framework with a primary 3DHS branch and an auxiliary discrimination branch. Specifically, to alleviate the multi-hierarchy conflicts, we propose a late-decoupled 3DHS framework which employs multiple decoders with the coarse-to-fine hierarchical guidance and consistency. The late-decoupled architecture can mitigate the underfitting and overfitting conflicts among multiple hierarchies and can also constrain the class imbalance problem in each individual hierarchy. Moreover, we introduce a 3DHS-oriented semantic prototype based bi-branch supervision mechanism, which additionally learns class-wise discriminative point cloud features and performs mutual supervision between the auxiliary and 3DHS branches, to enhance the class-imbalance segmentation. Extensive experiments on multiple datasets and backbones demonstrate that our approach achieves state-of-the-art 3DHS performance, and its core components can also be used as a plug-and-play enhancement to improve previous methods.
Related papers
- H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification [17.431440244641585]
H3Former is a novel token-to-region framework for fine-grained visual classification.<n>SAAM exploits multi-scale contextual cues to dynamically construct a weighted hypergraph among tokens.<n>HHCL enforces hierarchical semantic constraints in a non-Euclidean embedding space.
arXiv Detail & Related papers (2025-11-13T12:49:10Z) - A Semantics-Aware Hierarchical Self-Supervised Approach to Classification of Remote Sensing Images [12.282079123411947]
We present a novel Semantics-Aware Hierarchical Consensus (SAHC) method for learning hierarchical features and relationships.<n>The SAHC method is evaluated on three benchmark datasets with different degrees of hierarchical complexity.<n> Experimental results show both the effectiveness of the proposed approach in guiding network learning and the robustness of the hierarchical consensus for remote sensing image classification tasks.
arXiv Detail & Related papers (2025-10-06T15:30:39Z) - Feature Identification for Hierarchical Contrastive Learning [7.655211354400059]
We propose two novel hierarchical contrastive learning (HMLC) methods.<n>Our approach explicitly models inter-class relationships and imbalanced class distribution at higher hierarchy levels.<n>Our method achieves state-of-the-art performance in linear evaluation, outperforming existing hierarchical contrastive learning methods by 2 percentage points in terms of accuracy.
arXiv Detail & Related papers (2025-10-01T12:46:47Z) - CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting [53.15827818829865]
Methods that rely on 2D priors are prone to a critical challenge: cross-view semantic inconsistencies.<n>We propose CCL-LGS, a novel framework that enforces view-consistent semantic supervision by integrating multi-view semantic cues.<n>Our framework explicitly resolves semantic conflicts while preserving category discriminability.
arXiv Detail & Related papers (2025-05-26T19:09:33Z) - Proto-FG3D: Prototype-based Interpretable Fine-Grained 3D Shape Classification [59.68055837500357]
We propose the first prototype-based framework named Proto-FG3D for fine-grained 3D shape classification.<n>Proto-FG3D establishes joint multi-view and multi-category representation learning via Prototype Association.<n>Proto-FG3D surpasses state-of-the-art methods in accuracy, transparent predictions, and ad-hoc interpretability with visualizations.
arXiv Detail & Related papers (2025-05-23T09:31:02Z) - Multi-task Learning with 3D-Aware Regularization [55.97507478913053]
We propose a structured 3D-aware regularizer which interfaces multiple tasks through the projection of features extracted from an image encoder to a shared 3D feature space.
We show that the proposed method is architecture agnostic and can be plugged into various prior multi-task backbones to improve their performance.
arXiv Detail & Related papers (2023-10-02T08:49:56Z) - HKNAS: Classification of Hyperspectral Imagery Based on Hyper Kernel
Neural Architecture Search [104.45426861115972]
We propose to directly generate structural parameters by utilizing the specifically designed hyper kernels.
We obtain three kinds of networks to separately conduct pixel-level or image-level classifications with 1-D or 3-D convolutions.
A series of experiments on six public datasets demonstrate that the proposed methods achieve state-of-the-art results.
arXiv Detail & Related papers (2023-04-23T17:27:40Z) - HS3: Learning with Proper Task Complexity in Hierarchically Supervised
Semantic Segmentation [81.87943324048756]
We propose Hierarchically Supervised Semantic (HS3), a training scheme that supervises intermediate layers in a segmentation network to learn meaningful representations by varying task complexity.
Our proposed HS3-Fuse framework further improves segmentation predictions and achieves state-of-the-art results on two large segmentation benchmarks: NYUD-v2 and Cityscapes.
arXiv Detail & Related papers (2021-11-03T16:33:29Z) - Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose
Estimation [61.98690211671168]
We propose a Multi-level Attention-Decoder Network (MAED) to model multi-level attentions in a unified framework.
With the training set of 3DPW, MAED outperforms previous state-of-the-art methods by 6.2, 7.2, and 2.4 mm of PA-MPJPE.
arXiv Detail & Related papers (2021-09-06T09:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.