THOR2: Leveraging Topological Soft Clustering of Color Space for Human-Inspired Object Recognition in Unseen Environments
- URL: http://arxiv.org/abs/2408.01579v1
- Date: Fri, 2 Aug 2024 21:24:14 GMT
- Title: THOR2: Leveraging Topological Soft Clustering of Color Space for Human-Inspired Object Recognition in Unseen Environments
- Authors: Ekta U. Samani, Ashis G. Banerjee,
- Abstract summary: This study presents a 3D shape and color-based descriptor, TOPS2, for point clouds generated from RGB-D images and an accompanying recognition framework, THOR2.
The TOPS2 descriptor embodies object unity, a human cognition mechanism, by retaining the slicing-based topological representation of 3D shape from the TOPS descriptor.
THOR2, trained using synthetic data, demonstrates markedly improved recognition accuracy compared to THOR, its 3D shape-based predecessor.
- Score: 1.9950682531209158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual object recognition in unseen and cluttered indoor environments is a challenging problem for mobile robots. This study presents a 3D shape and color-based descriptor, TOPS2, for point clouds generated from RGB-D images and an accompanying recognition framework, THOR2. The TOPS2 descriptor embodies object unity, a human cognition mechanism, by retaining the slicing-based topological representation of 3D shape from the TOPS descriptor while capturing object color information through slicing-based color embeddings computed using a network of coarse color regions. These color regions, analogous to the MacAdam ellipses identified in human color perception, are obtained using the Mapper algorithm, a topological soft-clustering technique. THOR2, trained using synthetic data, demonstrates markedly improved recognition accuracy compared to THOR, its 3D shape-based predecessor, on two benchmark real-world datasets: the OCID dataset capturing cluttered scenes from different viewpoints and the UW-IS Occluded dataset reflecting different environmental conditions and degrees of object occlusion recorded using commodity hardware. THOR2 also outperforms baseline deep learning networks, and a widely-used ViT adapted for RGB-D inputs on both the datasets. Therefore, THOR2 is a promising step toward achieving robust recognition in low-cost robots.
Related papers
- Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - 3D Instance Segmentation Using Deep Learning on RGB-D Indoor Data [0.0]
2D region based convolutional neural networks (Mask R-CNN) deep learning model with point based rending module is adapted to integrate with depth information to recognize and segment 3D instances of objects.
In order to generate 3D point cloud coordinates, segmented 2D pixels of recognized object regions in the RGB image are merged into (u, v) points of the depth image.
arXiv Detail & Related papers (2024-06-19T08:00:35Z) - IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images [50.4538089115248]
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task.
We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion.
Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods.
arXiv Detail & Related papers (2024-03-30T07:17:37Z) - Pre-Training LiDAR-Based 3D Object Detectors Through Colorization [65.03659880456048]
We introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels.
GPC teaches the model to colorize LiDAR point clouds, equipping it with valuable semantic cues.
Experimental results on the KITTI and datasets demonstrate GPC's remarkable effectiveness.
arXiv Detail & Related papers (2023-10-23T06:00:24Z) - Human-Inspired Topological Representations for Visual Object Recognition
in Unseen Environments [2.356908851188234]
We propose a shape-based TOPS2 descriptor and a THOR2 framework for visual object recognition.
THOR2, trained using synthetic data, achieves substantially higher recognition accuracy than the shape-based THOR framework.
THOR2 is a promising step toward achieving robust recognition in low-cost robots.
arXiv Detail & Related papers (2023-09-15T08:24:07Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Topologically Persistent Features-based Object Recognition in Cluttered
Indoor Environments [1.2691047660244335]
Recognition of occluded objects in unseen indoor environments is a challenging problem for mobile robots.
This work proposes a new slicing-based topological descriptor that captures the 3D shape of object point clouds.
It yields similarities between the descriptors of the occluded and the corresponding unoccluded objects, enabling object unity-based recognition.
arXiv Detail & Related papers (2022-05-16T07:01:16Z) - Scale Invariant Semantic Segmentation with RGB-D Fusion [12.650574326251023]
We propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images.
We incorporate depth information to the RGB data for pixel-wise semantic segmentation to address the different scale objects in an outdoor scene.
Our model is compact and can be easily applied to the other RGB model.
arXiv Detail & Related papers (2022-04-10T12:54:27Z) - Unseen Object Instance Segmentation for Robotic Environments [67.88276573341734]
We propose a method to segment unseen object instances in tabletop environments.
UOIS-Net is comprised of two stages: first, it operates only on depth to produce object instance center votes in 2D or 3D.
Surprisingly, our framework is able to learn from synthetic RGB-D data where the RGB is non-photorealistic.
arXiv Detail & Related papers (2020-07-16T01:59:13Z) - Investigating the Importance of Shape Features, Color Constancy, Color
Spaces and Similarity Measures in Open-Ended 3D Object Recognition [4.437005770487858]
We study the importance of shape information, color constancy, color spaces, and various similarity measures in open-ended 3D object recognition.
Experimental results show that all of the textitcombinations of color and shape yields significant improvements over the textitshape-only and textitcolor-only approaches.
arXiv Detail & Related papers (2020-02-10T14:24:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.