Dizygotic Conditional Variational AutoEncoder for Multi-Modal and
Partial Modality Absent Few-Shot Learning
- URL: http://arxiv.org/abs/2106.14467v1
- Date: Mon, 28 Jun 2021 08:29:55 GMT
- Title: Dizygotic Conditional Variational AutoEncoder for Multi-Modal and
Partial Modality Absent Few-Shot Learning
- Authors: Yi Zhang and Sheng Huang and Xi Peng and Dan Yang
- Abstract summary: We present a novel multi-modal data augmentation approach named Dizygotic Conditional Variational AutoEncoder (DCVAE)
DCVAE conducts feature synthesis via pairing two Conditional Variational AutoEncoders (CVAEs) with the same seed but different modality conditions in a dizygotic symbiosis manner.
The generated features of two CVAEs are adaptively combined to yield the final feature, which can be converted back into its paired conditions.
- Score: 19.854565192491123
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data augmentation is a powerful technique for improving the performance of
the few-shot classification task. It generates more samples as supplements, and
then this task can be transformed into a common supervised learning issue for
solution. However, most mainstream data augmentation based approaches only
consider the single modality information, which leads to the low diversity and
quality of generated features. In this paper, we present a novel multi-modal
data augmentation approach named Dizygotic Conditional Variational AutoEncoder
(DCVAE) for addressing the aforementioned issue. DCVAE conducts feature
synthesis via pairing two Conditional Variational AutoEncoders (CVAEs) with the
same seed but different modality conditions in a dizygotic symbiosis manner.
Subsequently, the generated features of two CVAEs are adaptively combined to
yield the final feature, which can be converted back into its paired conditions
while ensuring these conditions are consistent with the original conditions not
only in representation but also in function. DCVAE essentially provides a new
idea of data augmentation in various multi-modal scenarios by exploiting the
complement of different modality prior information. Extensive experimental
results demonstrate our work achieves state-of-the-art performances on
miniImageNet, CIFAR-FS and CUB datasets, and is able to work well in the
partial modality absence case.
Related papers
- MoME: Mixture of Multimodal Experts for Cancer Survival Prediction [46.520971457396726]
Survival analysis, as a challenging task, requires integrating Whole Slide Images (WSIs) and genomic data for comprehensive decision-making.
Previous approaches utilize co-attention methods, which fuse features from both modalities only once after separate encoding.
We propose a Biased Progressive Clever (BPE) paradigm, performing encoding and fusion simultaneously.
arXiv Detail & Related papers (2024-06-14T03:44:33Z) - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism.
The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders.
Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - Coupled Variational Autoencoder [6.599344783327053]
We propose the Coupled Variational Auto-Encoder (C-VAE), which formulates the VAE problem as one of Optimal Transport (OT)
The C-VAE allows greater flexibility in priors and natural resolution of the prior hole problem.
We show that the C-VAE outperforms alternatives including VAE, WAE, and InfoVAE in fidelity to the data, quality of the latent representation, and in quality of generated samples.
arXiv Detail & Related papers (2023-06-05T03:36:31Z) - Optimal Condition Training for Target Source Separation [56.86138859538063]
We propose a new optimal condition training method for single-channel target source separation.
We show that the complementary information carried by the diverse semantic concepts significantly helps to disentangle and isolate sources of interest.
arXiv Detail & Related papers (2022-11-11T00:04:55Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Progressive Open-Domain Response Generation with Multiple Controllable
Attributes [13.599621571488033]
We propose a Progressively trained Hierarchical Vari-Decoder (PHED) to tackle this task.
PHED deploys Conditional AutoEncoder (CVAE) on Transformer to include one aspect of attributes at one stage.
PHED significantly outperforms the state-of-the-art neural generation models and produces more diverse responses as expected.
arXiv Detail & Related papers (2021-06-07T08:48:39Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Self-Supervised Variational Auto-Encoders [10.482805367361818]
We present a novel class of generative models, called self-supervised Variational Auto-Encoder (selfVAE)
This class of models allows to perform both conditional and unconditional sampling, while simplifying the objective function.
We present performance of our approach on three benchmark image data (Cifar10, Imagenette64, and CelebA)
arXiv Detail & Related papers (2020-10-05T13:42:28Z) - BasisVAE: Translation-invariant feature-level clustering with
Variational Autoencoders [9.51828574518325]
Variational Autoencoders (VAEs) provide a flexible and scalable framework for non-linear dimensionality reduction.
We show how a collapsed variational inference scheme leads to scalable and efficient inference for BasisVAE.
arXiv Detail & Related papers (2020-03-06T23:10:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.