Exploring the Representation Manifolds of Stable Diffusion Through the
Lens of Intrinsic Dimension
- URL: http://arxiv.org/abs/2302.09301v1
- Date: Thu, 16 Feb 2023 16:22:30 GMT
- Title: Exploring the Representation Manifolds of Stable Diffusion Through the
Lens of Intrinsic Dimension
- Authors: Henry Kvinge, Davis Brown, Charles Godfrey
- Abstract summary: We take a first step towards understanding basic geometric properties induced by prompts in Stable Diffusion.
We find that choice of prompt has a substantial impact on the intrinsic dimension of representations at both layers of the model.
Our evidence suggests that intrinsic dimension could be a useful tool for future studies of the impact of different prompts on text-to-image models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompting has become an important mechanism by which users can more
effectively interact with many flavors of foundation model. Indeed, the last
several years have shown that well-honed prompts can sometimes unlock emergent
capabilities within such models. While there has been a substantial amount of
empirical exploration of prompting within the community, relatively few works
have studied prompting at a mathematical level. In this work we aim to take a
first step towards understanding basic geometric properties induced by prompts
in Stable Diffusion, focusing on the intrinsic dimension of internal
representations within the model. We find that choice of prompt has a
substantial impact on the intrinsic dimension of representations at both layers
of the model which we explored, but that the nature of this impact depends on
the layer being considered. For example, in certain bottleneck layers of the
model, intrinsic dimension of representations is correlated with prompt
perplexity (measured using a surrogate model), while this correlation is not
apparent in the latent layers. Our evidence suggests that intrinsic dimension
could be a useful tool for future studies of the impact of different prompts on
text-to-image models.
Related papers
- Latent Functional Maps [34.20582953800544]
We show that this problem can be better addressed in the functional domain, mitigating complexity, while enhancing interpretability and performances on downstream tasks.
We introduce a multi-purpose framework to the representation learning community, which allows to: (i) compare different spaces in an interpretable way and measure their intrinsic similarity; (ii) find correspondences between them, both in unsupervised and weakly supervised settings, and (iii) to effectively transfer representations between distinct spaces.
arXiv Detail & Related papers (2024-06-20T10:43:28Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Intriguing Equivalence Structures of the Embedding Space of Vision
Transformers [1.7418480517632609]
Pre-trained large foundation models play a central role in the recent surge of artificial intelligence.
Due to their inherent complexity, these models are not well understood.
We show via analyses and systematic experiments that the representation space consists of large piecewise linear subspaces.
arXiv Detail & Related papers (2024-01-28T04:59:51Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis [20.316056261749946]
We propose an end-to-end vision and language model incorporating explicit knowledge graphs.
We also introduce an interactive out-of-distribution layer using implicit network operator.
In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval.
arXiv Detail & Related papers (2023-02-11T05:46:21Z) - Disentangling Shape and Pose for Object-Centric Deep Active Inference
Models [4.298360054690217]
We consider the problem of 3D object representation, and focus on different instances of the ShapeNet dataset.
We propose a model that factorizes object shape, pose and category, while still learning a representation for each factor using a deep neural network.
arXiv Detail & Related papers (2022-09-16T12:53:49Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z) - Causal Discovery in Physical Systems from Videos [123.79211190669821]
Causal discovery is at the core of human cognition.
We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure.
arXiv Detail & Related papers (2020-07-01T17:29:57Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.