Cross-Modal Generative Augmentation for Visual Question Answering
- URL: http://arxiv.org/abs/2105.04780v1
- Date: Tue, 11 May 2021 04:51:26 GMT
- Title: Cross-Modal Generative Augmentation for Visual Question Answering
- Authors: Zixu Wang, Yishu Miao, Lucia Specia
- Abstract summary: This paper introduces a generative model for data augmentation by leveraging the correlations among multiple modalities.
The proposed model is able to quantify the confidence of augmented data by its generative probability, and can be jointly updated with a downstream pipeline.
- Score: 34.9601948665926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation is an approach that can effectively improve the performance
of multimodal machine learning. This paper introduces a generative model for
data augmentation by leveraging the correlations among multiple modalities.
Different from conventional data augmentation approaches that apply low level
operations with deterministic heuristics, our method proposes to learn an
augmentation sampler that generates samples of the target modality conditioned
on observed modalities in the variational auto-encoder framework. Additionally,
the proposed model is able to quantify the confidence of augmented data by its
generative probability, and can be jointly updated with a downstream pipeline.
Experiments on Visual Question Answering tasks demonstrate the effectiveness of
the proposed generative model, which is able to boost the strong UpDn-based
models to the state-of-the-art performance.
Related papers
- GASE: Generatively Augmented Sentence Encoding [0.0]
We propose an approach to enhance sentence embeddings by applying generative text models for data augmentation at inference time.
Generatively Augmented Sentence uses diverse synthetic variants of input texts generated by paraphrasing, summarising or extracting keywords.
We find that generative augmentation leads to larger performance improvements for embedding models with lower baseline performance.
arXiv Detail & Related papers (2024-11-07T17:53:47Z) - EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification [10.334396596691048]
We propose EntAugment, a tuning-free and adaptive DA framework.
It dynamically assesses and adjusts the augmentation magnitudes for each sample during training.
We also introduce a novel entropy regularization term, EntLoss, which complements the EntAugment approach.
arXiv Detail & Related papers (2024-09-10T07:42:47Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.