Shape Anchor Guided Holistic Indoor Scene Understanding
- URL: http://arxiv.org/abs/2309.11133v1
- Date: Wed, 20 Sep 2023 08:30:20 GMT
- Title: Shape Anchor Guided Holistic Indoor Scene Understanding
- Authors: Mingyue Dong, Linxi Huan, Hanjiang Xiong, Shuhan Shen, Xianwei Zheng
- Abstract summary: We propose a shape anchor guided learning strategy (AncLearn) for robust holistic indoor scene understanding.
AncLearn generates anchors that dynamically fit instance surfaces to (i) unmix noise and target-related features for offering reliable proposals at the detection stage.
We embed AncLearn into a reconstruction-from-detection learning system (AncRec) to generate high-quality semantic scene models.
- Score: 9.463220988312218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a shape anchor guided learning strategy (AncLearn) for
robust holistic indoor scene understanding. We observe that the search space
constructed by current methods for proposal feature grouping and instance point
sampling often introduces massive noise to instance detection and mesh
reconstruction. Accordingly, we develop AncLearn to generate anchors that
dynamically fit instance surfaces to (i) unmix noise and target-related
features for offering reliable proposals at the detection stage, and (ii)
reduce outliers in object point sampling for directly providing well-structured
geometry priors without segmentation during reconstruction. We embed AncLearn
into a reconstruction-from-detection learning system (AncRec) to generate
high-quality semantic scene models in a purely instance-oriented manner.
Experiments conducted on the challenging ScanNetv2 dataset demonstrate that our
shape anchor-based method consistently achieves state-of-the-art performance in
terms of 3D object detection, layout estimation, and shape reconstruction. The
code will be available at https://github.com/Geo-Tell/AncRec.
Related papers
- Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - LASA: Instance Reconstruction from Real Scans using A Large-scale
Aligned Shape Annotation Dataset [17.530432165466507]
We present a novel Cross-Modal Shape Reconstruction (DisCo) method and an Occupancy-Guided 3D Object Detection (OccGOD) method.
Our methods achieve state-of-the-art performance in both instance-level scene reconstruction and 3D object detection tasks.
arXiv Detail & Related papers (2023-12-19T18:50:10Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - ARO-Net: Learning Implicit Fields from Anchored Radial Observations [25.703496065476067]
We introduce anchored radial observations (ARO), a novel shape encoding for learning implicit field representation of 3D shapes.
We develop a general and unified shape representation by employing a fixed set of anchors, via Fibonacci sampling, and designing a coordinate-based deep neural network.
We demonstrate the quality and generality of our network, coined ARO-Net, on surface reconstruction from sparse point clouds.
arXiv Detail & Related papers (2022-12-19T16:29:20Z) - Learning to Complete Object Shapes for Object-level Mapping in Dynamic
Scenes [30.500198859451434]
We propose a novel object-level mapping system that can simultaneously segment, track, and reconstruct objects in dynamic scenes.
It can further predict and complete their full geometries by conditioning on reconstructions from depth inputs and a category-level shape prior.
We evaluate its effectiveness by quantitatively and qualitatively testing it in both synthetic and real-world sequences.
arXiv Detail & Related papers (2022-08-09T22:56:33Z) - Point Scene Understanding via Disentangled Instance Mesh Reconstruction [21.92736190195887]
We propose aDisentangled Instance Mesh Reconstruction (DIMR) framework for effective point scene understanding.
A segmentation-based backbone is applied to reduce false positive object proposals.
We leverage a mesh-aware latent code space to disentangle the processes of shape completion and mesh generation.
arXiv Detail & Related papers (2022-03-31T06:36:07Z) - Stereo Neural Vernier Caliper [57.187088191829886]
We propose a new object-centric framework for learning-based stereo 3D object detection.
We tackle a problem of how to predict a refined update given an initial 3D cuboid guess.
Our approach achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2022-03-21T14:36:07Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - RfD-Net: Point Scene Understanding by Semantic Instance Reconstruction [19.535169371240073]
We introduce RfD-Net that jointly detects and reconstructs dense object surfaces directly from point clouds.
We decouple the instance reconstruction into global object localization and local shape prediction.
Our approach consistently outperforms the state-of-the-arts and improves over 11 of mesh IoU in object reconstruction.
arXiv Detail & Related papers (2020-11-30T12:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.