UniGeo: A Unified 3D Indoor Object Detection Framework Integrating Geometry-Aware Learning and Dynamic Channel Gating
- URL: http://arxiv.org/abs/2601.22616v1
- Date: Fri, 30 Jan 2026 06:15:50 GMT
- Title: UniGeo: A Unified 3D Indoor Object Detection Framework Integrating Geometry-Aware Learning and Dynamic Channel Gating
- Authors: Xing Yi, Jinyang Huang, Feng-Qi Cui, Anyang Tong, Ruimin Wang, Liu Liu, Dan Guo,
- Abstract summary: We propose a unified 3D indoor detection framework called UniGeo.<n>To model geometric relations in scenes, we first propose a geometry-aware learning module that establishes a learnable mapping from spatial relationships to feature weights.<n>Then, to further enhance point cloud feature representation, we propose a dynamic channel gating mechanism that leverages learnable channel-wise weighting.
- Score: 20.395441875640955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing adoption of robotics and augmented reality in real-world applications has driven considerable research interest in 3D object detection based on point clouds. While previous methods address unified training across multiple datasets, they fail to model geometric relationships in sparse point cloud scenes and ignore the feature distribution in significant areas, which ultimately restricts their performance. To deal with this issue, a unified 3D indoor detection framework, called UniGeo, is proposed. To model geometric relations in scenes, we first propose a geometry-aware learning module that establishes a learnable mapping from spatial relationships to feature weights, which enabes explicit geometric feature enhancement. Then, to further enhance point cloud feature representation, we propose a dynamic channel gating mechanism that leverages learnable channel-wise weighting. This mechanism adaptively optimizes features generated by the sparse 3D U-Net network, significantly enhancing key geometric information. Extensive experiments on six different indoor scene datasets clearly validate the superior performance of our method.
Related papers
- Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video [76.32954467706581]
We propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams.<n>We use a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision.<n>Experiments show that SAGE significantly enhances zero-shot generalization, reducing Chamfer Distance by 20-42% on unseen benchmarks.
arXiv Detail & Related papers (2026-02-08T09:53:21Z) - Learning Human Visual Attention on 3D Surfaces through Geometry-Queried Semantic Priors [0.0]
We introduce SemGeo-AttentionNet, a dual-stream architecture that formalizes the interplay between geometric processing and semantic recognition.<n>We extend our framework to temporal scanpath generation through reinforcement learning.<n> Evaluation on SAL3D, NUS3D and 3DVA datasets demonstrates substantial improvements.
arXiv Detail & Related papers (2026-02-06T06:15:38Z) - GeoTeacher: Geometry-Guided Semi-Supervised 3D Object Detection [19.447180486374357]
Semi-supervised 3D object detection has emerged as an active research area in recent years.<n>We propose GeoTeacher to enhance the student model's ability to capture geometric relations of objects with limited training data.<n>We introduce a voxel-wise data augmentation strategy that increases the diversity of object geometries.
arXiv Detail & Related papers (2025-12-29T02:24:18Z) - GRACE: Estimating Geometry-level 3D Human-Scene Contact from 2D Images [54.602947113980655]
Estimating the geometry level of human-scene contact aims to ground specific contact surface points at 3D human geometries.<n> GRACE (Geometry-level Reasoning for 3D Human-scene Contact Estimation) is a new paradigm for 3D human contact estimation.<n>It incorporates a point cloud encoder-decoder architecture along with a hierarchical feature extraction and fusion module.
arXiv Detail & Related papers (2025-05-10T09:25:46Z) - Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction [14.225228781008209]
This paper proposes a novel geometry integration mechanism for 3D scene reconstruction.
Our approach incorporates 3D geometry at three levels, i.e. feature learning, feature fusion, and network supervision.
arXiv Detail & Related papers (2024-08-28T08:02:47Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene
Understanding [19.134536179555102]
We propose an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected graph models.
The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research.
arXiv Detail & Related papers (2020-11-29T12:56:19Z) - Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box.
Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.