Smart Explorer: Recognizing Objects in Dense Clutter via Interactive
Exploration
- URL: http://arxiv.org/abs/2208.03496v1
- Date: Sat, 6 Aug 2022 11:04:04 GMT
- Title: Smart Explorer: Recognizing Objects in Dense Clutter via Interactive
Exploration
- Authors: Zhenyu Wu, Ziwei Wang, Zibu Wei, Yi Wei and Haibin Yan
- Abstract summary: Recognizing objects in dense clutter accurately plays an important role to a wide variety of robotic manipulation tasks.
We propose an interactive exploration framework called Smart Explorer for recognizing all objects in dense clutters.
- Score: 31.38518623440405
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recognizing objects in dense clutter accurately plays an important role to a
wide variety of robotic manipulation tasks including grasping, packing,
rearranging and many others. However, conventional visual recognition models
usually miss objects because of the significant occlusion among instances and
causes incorrect prediction due to the visual ambiguity with the high object
crowdedness. In this paper, we propose an interactive exploration framework
called Smart Explorer for recognizing all objects in dense clutters. Our Smart
Explorer physically interacts with the clutter to maximize the recognition
performance while minimize the number of motions, where the false positives and
negatives can be alleviated effectively with the optimal accuracy-efficiency
trade-offs. Specifically, we first collect the multi-view RGB-D images of the
clutter and reconstruct the corresponding point cloud. By aggregating the
instance segmentation of RGB images across views, we acquire the instance-wise
point cloud partition of the clutter through which the existed classes and the
number of objects for each class are predicted. The pushing actions for
effective physical interaction are generated to sizably reduce the recognition
uncertainty that consists of the instance segmentation entropy and multi-view
object disagreement. Therefore, the optimal accuracy-efficiency trade-off of
object recognition in dense clutter is achieved via iterative instance
prediction and physical interaction. Extensive experiments demonstrate that our
Smart Explorer acquires promising recognition accuracy with only a few actions,
which also outperforms the random pushing by a large margin.
Related papers
- Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing [2.0528748158119434]
multimodal learning can be used to integrate features from different data modalities, thereby improving detection accuracy.
In this paper, we propose to use Masked Image Modeling (MIM) as a pre-training technique, leveraging self-supervised learning on unlabeled data.
To address this, we propose a new interactive MIM method that can establish interactions between different tokens, which is particularly beneficial for object detection in remote sensing.
arXiv Detail & Related papers (2024-09-13T14:50:50Z) - MARS: Multimodal Active Robotic Sensing for Articulated Characterization [6.69660410213287]
We introduce MARS, a novel framework for articulated object characterization.
It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features.
Our method effectively generalizes to real-world articulated objects, enhancing robot interactions.
arXiv Detail & Related papers (2024-07-01T11:32:39Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Object-centric Cross-modal Feature Distillation for Event-based Object
Detection [87.50272918262361]
RGB detectors still outperform event-based detectors due to sparsity of the event data and missing visual details.
We develop a novel knowledge distillation approach to shrink the performance gap between these two modalities.
We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector.
arXiv Detail & Related papers (2023-11-09T16:33:08Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object
Detection [0.0]
We propose a mixed-scale triplet network, bf ZoomNet, which mimics the behavior of humans when observing vague images.
Specifically, our ZoomNet employs the zoom strategy to learn the discriminative mixed-scale semantics by the designed scale integration unit and hierarchical mixed-scale unit.
Our proposed highly task-friendly model consistently surpasses the existing 23 state-of-the-art methods on four public datasets.
arXiv Detail & Related papers (2022-03-05T09:13:52Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.