Visual Object Recognition in Indoor Environments Using Topologically
Persistent Features
- URL: http://arxiv.org/abs/2010.03196v5
- Date: Wed, 28 Jul 2021 18:05:18 GMT
- Title: Visual Object Recognition in Indoor Environments Using Topologically
Persistent Features
- Authors: Ekta U. Samani, Xingjian Yang, Ashis G. Banerjee
- Abstract summary: Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots.
We propose the use of topologically persistent features, which rely on the objects' shape information, to address this challenge.
We implement the proposed method on a real-world robot to demonstrate its usefulness.
- Score: 2.2344764434954256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object recognition in unseen indoor environments remains a challenging
problem for visual perception of mobile robots. In this letter, we propose the
use of topologically persistent features, which rely on the objects' shape
information, to address this challenge. In particular, we extract two kinds of
features, namely, sparse persistence image (PI) and amplitude, by applying
persistent homology to multi-directional height function-based filtrations of
the cubical complexes representing the object segmentation maps. The features
are then used to train a fully connected network for recognition. For
performance evaluation, in addition to a widely used shape dataset and a
benchmark indoor scenes dataset, we collect a new dataset, comprising scene
images from two different environments, namely, a living room and a mock
warehouse. The scenes are captured using varying camera poses under different
illumination conditions and include up to five different objects from a given
set of fourteen objects. On the benchmark indoor scenes dataset, sparse PI
features show better recognition performance in unseen environments than the
features learned using the widely used ResNetV2-56 and EfficientNet-B4 models.
Further, they provide slightly higher recall and accuracy values than Faster
R-CNN, an end-to-end object detection method, and its state-of-the-art variant,
Domain Adaptive Faster R-CNN. The performance of our methods also remains
relatively unchanged from the training environment (living room) to the unseen
environment (mock warehouse) in the new dataset. In contrast, the performance
of the object detection methods drops substantially. We also implement the
proposed method on a real-world robot to demonstrate its usefulness.
Related papers
- Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs [5.891295920078768]
We introduce an advanced approach for fine-grained object visual key field detection.
First, we use the segment anything model (SAM) to generate detailed spatial maps of objects in images.
Next, we use Vision Studio to extract semantic object descriptions.
Third, we employ GPT-4's common sense knowledge, bridging the gap between an object's semantics and its spatial map.
arXiv Detail & Related papers (2024-04-01T14:53:36Z) - Evaluation of Environmental Conditions on Object Detection using
Oriented Bounding Boxes for AR Applications [7.274773183842099]
Scene analysis and object recognition play a crucial role in augmented reality (AR)
New approach is proposed that involves using oriented bounding boxes with a detection and recognition deep network to improve performance and processing time.
Results indicate that the proposed approach tends to produce better Average Precision and greater accuracy for small objects in most of the tested conditions.
arXiv Detail & Related papers (2023-06-29T09:17:58Z) - Persistent Homology Meets Object Unity: Object Recognition in Clutter [2.356908851188234]
Recognition of occluded objects in unseen and unstructured indoor environments is a challenging problem for mobile robots.
We propose a new descriptor, TOPS, for point clouds generated from depth images and an accompanying recognition framework, THOR, inspired by human reasoning.
THOR outperforms state-of-the-art methods on both the datasets and achieves substantially higher recognition accuracy for all the scenarios of the UW-IS Occluded dataset.
arXiv Detail & Related papers (2023-05-05T19:42:39Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - AGO-Net: Association-Guided 3D Point Cloud Object Detection Network [86.10213302724085]
We propose a novel 3D detection framework that associates intact features for objects via domain adaptation.
We achieve new state-of-the-art performance on the KITTI 3D detection benchmark in both accuracy and speed.
arXiv Detail & Related papers (2022-08-24T16:54:38Z) - Robust Change Detection Based on Neural Descriptor Fields [53.111397800478294]
We develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results.
By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises.
arXiv Detail & Related papers (2022-08-01T17:45:36Z) - Combining Local and Global Pose Estimation for Precise Tracking of
Similar Objects [2.861848675707602]
We present a multi-object 6D detection and tracking pipeline for potentially similar and non-textured objects.
A new network architecture, trained solely with synthetic images, allows simultaneous pose estimation of multiple objects.
We show how the system can be used in a real AR assistance application within the field of construction.
arXiv Detail & Related papers (2022-01-31T14:36:57Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors.
Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf)
For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z) - Object Detection in the Context of Mobile Augmented Reality [16.49070406578342]
We propose a novel approach that combines the geometric information from VIO with semantic information from object detectors to improve the performance of object detection on mobile devices.
Our approach includes three components: (1) an image orientation correction method, (2) a scale-based filtering approach, and (3) an online semantic map.
The results show that our approach can improve on the accuracy of generic object detectors by 12% on our dataset.
arXiv Detail & Related papers (2020-08-15T05:15:00Z) - Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR.
Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking.
Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.