VisTaNet: Attention Guided Deep Fusion for Surface Roughness
Classification
- URL: http://arxiv.org/abs/2209.08516v1
- Date: Sun, 18 Sep 2022 09:37:06 GMT
- Title: VisTaNet: Attention Guided Deep Fusion for Surface Roughness
Classification
- Authors: Prasanna Kumar Routray, Aditya Sanjiv Kanade, Jay Bhanushali,
Manivannan Muniyandi
- Abstract summary: This paper presents a visual dataset that augments an existing tactile dataset.
We propose a novel deep fusion architecture that fuses visual and tactile data using four types of fusion strategies.
Our model shows significant performance improvements (97.22%) in surface roughness classification accuracy over tactile only.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Human texture perception is a weighted average of multi-sensory inputs:
visual and tactile. While the visual sensing mechanism extracts global
features, the tactile mechanism complements it by extracting local features.
The lack of coupled visuotactile datasets in the literature is a challenge for
studying multimodal fusion strategies analogous to human texture perception.
This paper presents a visual dataset that augments an existing tactile dataset.
We propose a novel deep fusion architecture that fuses visual and tactile data
using four types of fusion strategies: summation, concatenation, max-pooling,
and attention. Our model shows significant performance improvements (97.22%) in
surface roughness classification accuracy over tactile only (SVM - 92.60%) and
visual only (FENet-50 - 85.01%) architectures. Among the several fusion
techniques, attention-guided architecture results in better classification
accuracy. Our study shows that analogous to human texture perception, the
proposed model chooses a weighted combination of the two modalities (visual and
tactile), thus resulting in higher surface roughness classification accuracy;
and it chooses to maximize the weightage of the tactile modality where the
visual modality fails and vice-versa.
Related papers
- Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds [12.449232689517538]
We propose a visual-tactile fusion framework for transparent object grasping.
It includes grasping position detection, tactile calibration, and visual-tactile fusion based classification.
The proposed framework synergizes the advantages of vision and touch, and greatly improves the grasping efficiency of transparent objects.
arXiv Detail & Related papers (2022-11-30T02:27:41Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Scene-Aware
Ambidextrous Bin Picking via Physics-based Metaverse Synthesis [72.85526892440251]
We introduce MetaGraspNet, a large-scale photo-realistic bin picking dataset constructed via physics-based metaverse synthesis.
The proposed dataset contains 217k RGBD images across 82 different article types, with full annotations for object detection, amodal perception, keypoint detection, manipulation order and ambidextrous grasp labels for a parallel-jaw and vacuum gripper.
We also provide a real dataset consisting of over 2.3k fully annotated high-quality RGBD images, divided into 5 levels of difficulties and an unseen object set to evaluate different object and layout properties.
arXiv Detail & Related papers (2022-08-08T08:15:34Z) - Tactile-ViewGCN: Learning Shape Descriptor from Tactile Data using Graph
Convolutional Network [0.4189643331553922]
It focuses on improving previous works on object classification using tactile data.
We propose a novel method, dubbed as Tactile-ViewGCN, that hierarchically aggregate tactile features.
Our model outperforms previous methods on the STAG dataset with an accuracy of 81.82%.
arXiv Detail & Related papers (2022-03-12T05:58:21Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Elastic Tactile Simulation Towards Tactile-Visual Perception [58.44106915440858]
We propose Elastic Interaction of Particles (EIP) for tactile simulation.
EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact.
We further propose a tactile-visual perception network that enables information fusion between tactile data and visual images.
arXiv Detail & Related papers (2021-08-11T03:49:59Z) - Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z) - Spatio-temporal Attention Model for Tactile Texture Recognition [25.06942319117782]
We propose a novel Spatio-Temporal Attention Model (STAM) for tactile texture recognition.
The proposed STAM pays attention to both spatial focus of each single tactile texture and the temporal correlation of a tactile sequence.
In the experiments to discriminate 100 different fabric textures, the spatially and temporally selective attention has resulted in a significant improvement of the recognition accuracy.
arXiv Detail & Related papers (2020-08-10T22:32:34Z) - TactileSGNet: A Spiking Graph Neural Network for Event-based Tactile
Object Recognition [17.37142241982902]
New advances in flexible, event-driven, electronic skins may soon endow robots with touch perception capabilities similar to humans.
These unique features may render current deep learning approaches such as convolutional feature extractors unsuitable for tactile learning.
We propose a novel spiking graph neural network for event-based tactile object recognition.
arXiv Detail & Related papers (2020-08-01T03:35:15Z) - Teaching Cameras to Feel: Estimating Tactile Physical Properties of
Surfaces From Images [4.666400601228301]
We introduce the challenging task of estimating a set of tactile physical properties from visual information.
We construct a first of its kind image-tactile dataset with over 400 multiview image sequences and the corresponding tactile properties.
We develop a cross-modal framework comprised of an adversarial objective and a novel visuo-tactile joint classification loss.
arXiv Detail & Related papers (2020-04-29T21:27:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.