A Transfer Learning Approach to Cross-Modal Object Recognition: From
Visual Observation to Robotic Haptic Exploration
- URL: http://arxiv.org/abs/2001.06673v1
- Date: Sat, 18 Jan 2020 14:47:02 GMT
- Title: A Transfer Learning Approach to Cross-Modal Object Recognition: From
Visual Observation to Robotic Haptic Exploration
- Authors: Pietro Falco, Shuang Lu, Ciro Natale, Salvatore Pirozzi, and Dongheui
Lee
- Abstract summary: We introduce the problem of cross-modal visuo-tactile object recognition with robotic active exploration.
We propose an approach constituted by four steps: finding a visuo-tactile common representation, defining a suitable set of features, transferring the features across the domains, and classifying the objects.
The proposed approach achieves an accuracy of 94.7%, which is comparable with the accuracy of the monomodal case.
- Score: 13.482253411041292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce the problem of cross-modal visuo-tactile object
recognition with robotic active exploration. With this term, we mean that the
robot observes a set of objects with visual perception and, later on, it is
able to recognize such objects only with tactile exploration, without having
touched any object before. Using a machine learning terminology, in our
application we have a visual training set and a tactile test set, or vice
versa. To tackle this problem, we propose an approach constituted by four
steps: finding a visuo-tactile common representation, defining a suitable set
of features, transferring the features across the domains, and classifying the
objects. We show the results of our approach using a set of 15 objects,
collecting 40 visual examples and five tactile examples for each object. The
proposed approach achieves an accuracy of 94.7%, which is comparable with the
accuracy of the monomodal case, i.e., when using visual data both as training
set and test set. Moreover, it performs well compared to the human ability,
which we have roughly estimated carrying out an experiment with ten
participants.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning [9.178588671620963]
This work aims to recognise the latent unobservable object characteristics.
vision is commonly used for object recognition by robots, but it is ineffective for detecting hidden objects.
We propose a cross-modal transfer learning approach from vision to haptic-audio.
arXiv Detail & Related papers (2024-03-15T21:18:14Z) - InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z) - The ObjectFolder Benchmark: Multisensory Learning with Neural and Real
Objects [51.22194706674366]
We introduce the Object Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning.
We also introduce the Object Real dataset, including the multisensory measurements for 100 real-world household objects.
arXiv Detail & Related papers (2023-06-01T17:51:22Z) - Visual-Tactile Multimodality for Following Deformable Linear Objects
Using Reinforcement Learning [15.758583731036007]
We study the problem of using vision and tactile inputs together to complete the task of following deformable linear objects.
We create a Reinforcement Learning agent using different sensing modalities and investigate how its behaviour can be boosted.
Our experiments show that the use of both vision and tactile inputs, together with proprioception, allows the agent to complete the task in up to 92% of cases.
arXiv Detail & Related papers (2022-03-31T21:59:08Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Simultaneous Multi-View Object Recognition and Grasping in Open-Ended
Domains [0.0]
We propose a deep learning architecture with augmented memory capacities to handle open-ended object recognition and grasping simultaneously.
We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings.
arXiv Detail & Related papers (2021-06-03T14:12:11Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.