Text-driven object affordance for guiding grasp-type recognition in
multimodal robot teaching
- URL: http://arxiv.org/abs/2103.00268v2
- Date: Fri, 12 May 2023 12:35:49 GMT
- Title: Text-driven object affordance for guiding grasp-type recognition in
multimodal robot teaching
- Authors: Naoki Wake, Daichi Saito, Kazuhiro Sasabuchi, Hideki Koike, Katsushi
Ikeuchi
- Abstract summary: This study investigates how text-driven object affordance affects image-based grasp-type recognition in robot teaching.
They created labeled datasets of first-person hand images to examine the impact of object affordance on recognition performance.
- Score: 18.529563816600607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study investigates how text-driven object affordance, which provides
prior knowledge about grasp types for each object, affects image-based
grasp-type recognition in robot teaching. The researchers created labeled
datasets of first-person hand images to examine the impact of object affordance
on recognition performance. They evaluated scenarios with real and illusory
objects, considering mixed reality teaching conditions where visual object
information may be limited. The results demonstrate that object affordance
improves image-based recognition by filtering out unlikely grasp types and
emphasizing likely ones. The effectiveness of object affordance was more
pronounced when there was a stronger bias towards specific grasp types for each
object. These findings highlight the significance of object affordance in
multimodal robot teaching, regardless of whether real objects are present in
the images. Sample code is available on
https://github.com/microsoft/arr-grasp-type-recognition.
Related papers
- Leveraging Foundation Models To learn the shape of semi-fluid deformable objects [0.7895162173260983]
A keen interest was manifested by researchers in the last decade to characterize and manipulate deformable objects of non-fluid nature.
In this paper, we address the subject of characterizing weld pool to define stable features that serve as information for motion control objectives.
The performance of knowledge distillation from foundation models into a smaller generative model shows prominent results in the characterization of deformable objects.
arXiv Detail & Related papers (2024-11-25T13:41:35Z) - Which objects help me to act effectively? Reasoning about physically-grounded affordances [0.6291443816903801]
A key aspect of this understanding lies in detecting an object's affordances.
Our approach leverages a dialogue of large language models (LLMs) and vision-language models (VLMs) to achieve open-world affordance detection.
By grounding our system in the physical world, we account for the robot's embodiment and the intrinsic properties of the objects it encounters.
arXiv Detail & Related papers (2024-07-18T11:08:57Z) - Retrieval Robust to Object Motion Blur [54.34823913494456]
We propose a method for object retrieval in images that are affected by motion blur.
We present the first large-scale datasets for blurred object retrieval.
Our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets.
arXiv Detail & Related papers (2024-04-27T23:22:39Z) - Matching Multiple Perspectives for Efficient Representation Learning [0.0]
We present an approach that combines self-supervised learning with a multi-perspective matching technique.
We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance.
arXiv Detail & Related papers (2022-08-16T10:33:13Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Simultaneous Multi-View Object Recognition and Grasping in Open-Ended
Domains [0.0]
We propose a deep learning architecture with augmented memory capacities to handle open-ended object recognition and grasping simultaneously.
We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings.
arXiv Detail & Related papers (2021-06-03T14:12:11Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.