Visual-Tactile Cross-Modal Data Generation using Residue-Fusion GAN with
Feature-Matching and Perceptual Losses
- URL: http://arxiv.org/abs/2107.05468v1
- Date: Mon, 12 Jul 2021 14:36:16 GMT
- Title: Visual-Tactile Cross-Modal Data Generation using Residue-Fusion GAN with
Feature-Matching and Perceptual Losses
- Authors: Shaoyu Cai, Kening Zhu, Yuki Ban, Takuji Narumi
- Abstract summary: We propose a deep-learning-based approach for cross-modal visual-tactile data generation by leveraging the framework of the generative adversarial networks (GANs)
Our approach takes the visual image of a material surface as the visual data, and the accelerometer signal induced by the pen-sliding movement on the surface as the tactile data.
We adopt the conditional-GAN (cGAN) structure together with the residue-fusion (RF) module, and train the model with the additional feature-matching (FM) and perceptual losses to achieve the cross-modal data generation.
- Score: 13.947606247944597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing psychophysical studies have revealed that the cross-modal
visual-tactile perception is common for humans performing daily activities.
However, it is still challenging to build the algorithmic mapping from one
modality space to another, namely the cross-modal visual-tactile data
translation/generation, which could be potentially important for robotic
operation. In this paper, we propose a deep-learning-based approach for
cross-modal visual-tactile data generation by leveraging the framework of the
generative adversarial networks (GANs). Our approach takes the visual image of
a material surface as the visual data, and the accelerometer signal induced by
the pen-sliding movement on the surface as the tactile data. We adopt the
conditional-GAN (cGAN) structure together with the residue-fusion (RF) module,
and train the model with the additional feature-matching (FM) and perceptual
losses to achieve the cross-modal data generation. The experimental results
show that the inclusion of the RF module, and the FM and the perceptual losses
significantly improves cross-modal data generation performance in terms of the
classification accuracy upon the generated data and the visual similarity
between the ground-truth and the generated data.
Related papers
- DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Representation Learning for Wearable-Based Applications in the Case of
Missing Data [20.37256375888501]
multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations.
We investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches.
Our study provides insights for the design and development of masking-based self-supervised learning tasks.
arXiv Detail & Related papers (2024-01-08T08:21:37Z) - A Generative Self-Supervised Framework using Functional Connectivity in
fMRI Data [15.211387244155725]
Deep neural networks trained on Functional Connectivity (FC) networks extracted from functional Magnetic Resonance Imaging (fMRI) data have gained popularity.
Recent research on the application of Graph Neural Network (GNN) to FC suggests that exploiting the time-varying properties of the FC could significantly improve the accuracy and interpretability of the model prediction.
High cost of acquiring high-quality fMRI data and corresponding labels poses a hurdle to their application in real-world settings.
We propose a generative SSL approach that is tailored to effectively harnesstemporal information within dynamic FC.
arXiv Detail & Related papers (2023-12-04T16:14:43Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - Cross-modal Orthogonal High-rank Augmentation for RGB-Event
Transformer-trackers [58.802352477207094]
We explore the great potential of a pre-trained vision Transformer (ViT) to bridge the vast distribution gap between two modalities.
We propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively.
Experiments demonstrate that our plug-and-play training augmentation techniques can significantly boost state-of-the-art one-stream and two trackersstream to a large extent in terms of both tracking precision and success rate.
arXiv Detail & Related papers (2023-07-09T08:58:47Z) - Exploring Invariant Representation for Visible-Infrared Person
Re-Identification [77.06940947765406]
Cross-spectral person re-identification, which aims to associate identities to pedestrians across different spectra, faces a main challenge of the modality discrepancy.
In this paper, we address the problem from both image-level and feature-level in an end-to-end hybrid learning framework named robust feature mining network (RFM)
Experiment results on two standard cross-spectral person re-identification datasets, RegDB and SYSU-MM01, have demonstrated state-of-the-art performance.
arXiv Detail & Related papers (2023-02-02T05:24:50Z) - A transfer learning enhanced the physics-informed neural network model
for vortex-induced vibration [0.0]
This paper proposed a transfer learning enhanced the physics-informed neural network (PINN) model to study the VIV (2D)
The physics-informed neural network, when used in conjunction with the transfer learning method, enhances learning efficiency and keeps predictability in the target task by common characteristics knowledge from the source model without requiring a huge quantity of datasets.
arXiv Detail & Related papers (2021-12-29T08:20:23Z) - Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.