Human-Inspired Topological Representations for Visual Object Recognition
in Unseen Environments
- URL: http://arxiv.org/abs/2309.08239v1
- Date: Fri, 15 Sep 2023 08:24:07 GMT
- Title: Human-Inspired Topological Representations for Visual Object Recognition
in Unseen Environments
- Authors: Ekta U. Samani and Ashis G. Banerjee
- Abstract summary: We propose a shape-based TOPS2 descriptor and a THOR2 framework for visual object recognition.
THOR2, trained using synthetic data, achieves substantially higher recognition accuracy than the shape-based THOR framework.
THOR2 is a promising step toward achieving robust recognition in low-cost robots.
- Score: 2.356908851188234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual object recognition in unseen and cluttered indoor environments is a
challenging problem for mobile robots. Toward this goal, we extend our previous
work to propose the TOPS2 descriptor, and an accompanying recognition
framework, THOR2, inspired by a human reasoning mechanism known as object
unity. We interleave color embeddings obtained using the Mapper algorithm for
topological soft clustering with the shape-based TOPS descriptor to obtain the
TOPS2 descriptor. THOR2, trained using synthetic data, achieves substantially
higher recognition accuracy than the shape-based THOR framework and outperforms
RGB-D ViT on two real-world datasets: the benchmark OCID dataset and the UW-IS
Occluded dataset. Therefore, THOR2 is a promising step toward achieving robust
recognition in low-cost robots.
Related papers
- THOR2: Leveraging Topological Soft Clustering of Color Space for Human-Inspired Object Recognition in Unseen Environments [1.9950682531209158]
This study presents a 3D shape and color-based descriptor, TOPS2, for point clouds generated from RGB-D images and an accompanying recognition framework, THOR2.
The TOPS2 descriptor embodies object unity, a human cognition mechanism, by retaining the slicing-based topological representation of 3D shape from the TOPS descriptor.
THOR2, trained using synthetic data, demonstrates markedly improved recognition accuracy compared to THOR, its 3D shape-based predecessor.
arXiv Detail & Related papers (2024-08-02T21:24:14Z) - Persistent Homology Meets Object Unity: Object Recognition in Clutter [2.356908851188234]
Recognition of occluded objects in unseen and unstructured indoor environments is a challenging problem for mobile robots.
We propose a new descriptor, TOPS, for point clouds generated from depth images and an accompanying recognition framework, THOR, inspired by human reasoning.
THOR outperforms state-of-the-art methods on both the datasets and achieves substantially higher recognition accuracy for all the scenarios of the UW-IS Occluded dataset.
arXiv Detail & Related papers (2023-05-05T19:42:39Z) - Benchmarking Spatial Relationships in Text-to-Image Generation [102.62422723894232]
We investigate the ability of text-to-image models to generate correct spatial relationships among objects.
We present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
Our experiments reveal a surprising finding that, although state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations between them.
arXiv Detail & Related papers (2022-12-20T06:03:51Z) - Multi-Modal Human Authentication Using Silhouettes, Gait and RGB [59.46083527510924]
Whole-body-based human authentication is a promising approach for remote biometrics scenarios.
We propose Dual-Modal Ensemble (DME), which combines both RGB and silhouette data to achieve more robust performances for indoor and outdoor whole-body based recognition.
Within DME, we propose GaitPattern, which is inspired by the double helical gait pattern used in traditional gait analysis.
arXiv Detail & Related papers (2022-10-08T15:17:32Z) - DcnnGrasp: Towards Accurate Grasp Pattern Recognition with Adaptive
Regularizer Learning [13.08779945306727]
Current state-of-the-art methods ignore category information of objects which is crucial for grasp pattern recognition.
This paper presents a novel dual-branch convolutional neural network (DcnnGrasp) to achieve joint learning of object category classification and grasp pattern recognition.
arXiv Detail & Related papers (2022-05-11T00:34:27Z) - Unseen Object Instance Segmentation with Fully Test-time RGB-D
Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios.
We re-emphasize the adaptation process across Sim2Real domains in this paper.
We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - DRG: Dual Relation Graph for Human-Object Interaction Detection [65.50707710054141]
We tackle the challenging problem of human-object interaction (HOI) detection.
Existing methods either recognize the interaction of each human-object pair in isolation or perform joint inference based on complex appearance-based features.
In this paper, we leverage an abstract spatial-semantic representation to describe each human-object pair and aggregate the contextual information of the scene via a dual relation graph.
arXiv Detail & Related papers (2020-08-26T17:59:40Z) - Unseen Object Instance Segmentation for Robotic Environments [67.88276573341734]
We propose a method to segment unseen object instances in tabletop environments.
UOIS-Net is comprised of two stages: first, it operates only on depth to produce object instance center votes in 2D or 3D.
Surprisingly, our framework is able to learn from synthetic RGB-D data where the RGB is non-photorealistic.
arXiv Detail & Related papers (2020-07-16T01:59:13Z) - Unsupervised Domain Adaptation through Inter-modal Rotation for RGB-D
Object Recognition [31.24587317555857]
We propose a novel RGB-D DA method that reduces the synthetic-to-real domain shift by exploiting the inter-modal relation between the RGB and depth image.
Our method consists of training a convolutional neural network to solve, in addition to the main recognition task, the pretext task of predicting the relative rotation between the RGB and depth image.
arXiv Detail & Related papers (2020-04-21T13:53:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.