Explaining latent representations of generative models with large multimodal models
- URL: http://arxiv.org/abs/2402.01858v3
- Date: Thu, 18 Apr 2024 03:54:39 GMT
- Title: Explaining latent representations of generative models with large multimodal models
- Authors: Mengdan Zhu, Zhenke Liu, Bo Pan, Abhinav Angirekula, Liang Zhao,
- Abstract summary: Learning interpretable representations of data generative latent factors is an important topic for the development of artificial intelligence.
We propose a framework to comprehensively explain each latent variable in the generative models using a large multimodal model.
- Score: 5.9908087713968925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning interpretable representations of data generative latent factors is an important topic for the development of artificial intelligence. With the rise of the large multimodal model, it can align images with text to generate answers. In this work, we propose a framework to comprehensively explain each latent variable in the generative models using a large multimodal model. We further measure the uncertainty of our generated explanations, quantitatively evaluate the performance of explanation generation among multiple large multimodal models, and qualitatively visualize the variations of each latent variable to learn the disentanglement effects of different generative models on explanations. Finally, we discuss the explanatory capabilities and limitations of state-of-the-art large multimodal models.
Related papers
- Diffusion Models for Multi-Task Generative Modeling [32.61765315067488]
We propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space.
We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling.
arXiv Detail & Related papers (2024-07-24T18:04:17Z) - LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multi-modal Foundation Models [4.675123839851372]
This paper introduces LatentExplainer, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models.
LatentExplainer tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability.
We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations of latent variables.
arXiv Detail & Related papers (2024-06-21T04:39:03Z) - Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models [51.21351775178525]
DiffExplainer is a novel framework that, leveraging language-vision models, enables multimodal global explainability.
It employs diffusion models conditioned on optimized text prompts, synthesizing images that maximize class outputs.
The analysis of generated visual descriptions allows for automatic identification of biases and spurious features.
arXiv Detail & Related papers (2024-04-03T10:11:22Z) - Revealing Multimodal Contrastive Representation Learning through Latent
Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds [5.549794481031468]
Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research.
In this work, we consider a variational bound that can tightly approximate the data log-likelihood.
We develop more flexible aggregation schemes that generalize PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks.
arXiv Detail & Related papers (2023-09-01T10:32:21Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - A survey of multimodal deep generative models [20.717591403306287]
Multimodal learning is a framework for building models that make predictions based on different types of modalities.
Deep generative models in which distributions are parameterized by deep neural networks have attracted much attention.
arXiv Detail & Related papers (2022-07-05T15:48:51Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [55.28436972267793]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Learning Structured Latent Factors from Dependent Data:A Generative
Model Framework from Information-Theoretic Perspective [18.88255368184596]
We present a novel framework for learning generative models with various underlying structures in the latent space.
Our model provides a principled approach to learn a set of semantically meaningful latent factors that reflect various types of desired structures.
arXiv Detail & Related papers (2020-07-21T06:59:29Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.