Simultaneous Multi-View Object Recognition and Grasping in Open-Ended
Domains
- URL: http://arxiv.org/abs/2106.01866v1
- Date: Thu, 3 Jun 2021 14:12:11 GMT
- Title: Simultaneous Multi-View Object Recognition and Grasping in Open-Ended
Domains
- Authors: Hamidreza Kasaei, Sha Luo, Remo Sasso, Mohammadreza Kasaei
- Abstract summary: We propose a deep learning architecture with augmented memory capacities to handle open-ended object recognition and grasping simultaneously.
We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A robot working in human-centric environments needs to know which kind of
objects exist in the scene, where they are, and how to grasp and manipulate
various objects in different situations to help humans in everyday tasks.
Therefore, object recognition and grasping are two key functionalities for such
robots. Most state-of-the-art tackles object recognition and grasping as two
separate problems while both use visual input. Furthermore, the knowledge of
the robot is fixed after the training phase. In such cases, if the robot faces
new object categories, it must retrain from scratch to incorporate new
information without catastrophic interference. To address this problem, we
propose a deep learning architecture with augmented memory capacities to handle
open-ended object recognition and grasping simultaneously. In particular, our
approach takes multi-views of an object as input and jointly estimates
pixel-wise grasp configuration as well as a deep scale- and rotation-invariant
representation as outputs. The obtained representation is then used for
open-ended object recognition through a meta-active learning technique. We
demonstrate the ability of our approach to grasp never-seen-before objects and
to rapidly learn new object categories using very few examples on-site in both
simulation and real-world settings.
Related papers
- ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping.
We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - Lifelong Ensemble Learning based on Multiple Representations for
Few-Shot Object Recognition [6.282068591820947]
We present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem.
To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly.
We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios.
arXiv Detail & Related papers (2022-05-04T10:29:10Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - DemoGrasp: Few-Shot Learning for Robotic Grasping with Human
Demonstration [42.19014385637538]
We propose to teach a robot how to grasp an object with a simple and short human demonstration.
We first present a small sequence of RGB-D images displaying a human-object interaction.
This sequence is then leveraged to build associated hand and object meshes that represent the interaction.
arXiv Detail & Related papers (2021-12-06T08:17:12Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - Reactive Human-to-Robot Handovers of Arbitrary Objects [57.845894608577495]
We present a vision-based system that enables human-to-robot handovers of unknown objects.
Our approach combines closed-loop motion planning with real-time, temporally-consistent grasp generation.
We demonstrate the generalizability, usability, and robustness of our approach on a novel benchmark set of 26 diverse household objects.
arXiv Detail & Related papers (2020-11-17T21:52:22Z) - Learning Object-Based State Estimators for Household Robots [11.055133590909097]
We build object-based memory systems that operate on high-dimensional observations and hypotheses.
We demonstrate the system's effectiveness in maintaining memory of dynamically changing objects in both simulated environment and real images.
arXiv Detail & Related papers (2020-11-06T04:18:52Z) - Open-Ended Fine-Grained 3D Object Categorization by Combining Shape and
Texture Features in Multiple Colorspaces [5.89118432388542]
In this work, shape information encodes the common patterns of all categories, while texture information is used to describe the appearance of each instance in detail.
The proposed network architecture out-performed the selected state-of-the-art approaches in terms of object classification accuracy and scalability.
arXiv Detail & Related papers (2020-09-19T14:06:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.