Generative Partial Visual-Tactile Fused Object Clustering
- URL: http://arxiv.org/abs/2012.14070v2
- Date: Sun, 14 Feb 2021 08:15:48 GMT
- Title: Generative Partial Visual-Tactile Fused Object Clustering
- Authors: Tao Zhang and Yang Cong and Gan Sun and Jiahua Dong and Yuyang Liu and
Zhengming Ding
- Abstract summary: We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering.
A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality.
To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
- Score: 81.17645983141773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual-tactile fused sensing for object clustering has achieved significant
progresses recently, since the involvement of tactile modality can effectively
improve clustering performance. However, the missing data (i.e., partial data)
issues always happen due to occlusion and noises during the data collecting
process. This issue is not well solved by most existing partial multi-view
clustering methods for the heterogeneous modality challenge. Naively employing
these methods would inevitably induce a negative effect and further hurt the
performance. To solve the mentioned challenges, we propose a Generative Partial
Visual-Tactile Fused (i.e., GPVTF) framework for object clustering. More
specifically, we first do partial visual and tactile features extraction from
the partial visual and tactile data, respectively, and encode the extracted
features in modality-specific feature subspaces. A conditional cross-modal
clustering generative adversarial network is then developed to synthesize one
modality conditioning on the other modality, which can compensate missing
samples and align the visual and tactile modalities naturally by adversarial
learning. To the end, two pseudo-label based KL-divergence losses are employed
to update the corresponding modality-specific encoders. Extensive comparative
experiments on three public visual-tactile datasets prove the effectiveness of
our method.
Related papers
- Fast Disentangled Slim Tensor Learning for Multi-view Clustering [28.950845031752927]
We propose a new approach termed fast Disdentangle Slim Learning (DSTL) for multi-view clustering.
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
Our proposed model is computationally efficient and can be solved effectively.
arXiv Detail & Related papers (2024-11-12T09:57:53Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - A Contrastive Variational Graph Auto-Encoder for Node Clustering [10.52321770126932]
State-of-the-art clustering methods have numerous challenges.
Existing VGAEs do not account for the discrepancy between the inference and generative models.
Our solution has two mechanisms to control the trade-off between Feature Randomness and Feature Drift.
arXiv Detail & Related papers (2023-12-28T05:07:57Z) - Feature Completion Transformer for Occluded Person Re-identification [25.159974510754992]
Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders.
We propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space.
FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.
arXiv Detail & Related papers (2023-03-03T01:12:57Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z) - Visual-Tactile Cross-Modal Data Generation using Residue-Fusion GAN with
Feature-Matching and Perceptual Losses [13.947606247944597]
We propose a deep-learning-based approach for cross-modal visual-tactile data generation by leveraging the framework of the generative adversarial networks (GANs)
Our approach takes the visual image of a material surface as the visual data, and the accelerometer signal induced by the pen-sliding movement on the surface as the tactile data.
We adopt the conditional-GAN (cGAN) structure together with the residue-fusion (RF) module, and train the model with the additional feature-matching (FM) and perceptual losses to achieve the cross-modal data generation.
arXiv Detail & Related papers (2021-07-12T14:36:16Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Transductive Zero-Shot Learning by Decoupled Feature Generation [30.664199050468472]
We focus on the transductive setting, in which unlabelled visual data from unseen classes is available.
We propose to decouple tasks of generating realistic visual features and translating semantic attributes into visual cues.
We present a detailed ablation study to dissect the effect of our proposed decoupling approach, while demonstrating its superiority over the related state-of-the-art.
arXiv Detail & Related papers (2021-02-05T16:17:52Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.