Related papers: Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

URL: http://arxiv.org/abs/2302.09301v1
Date: Thu, 16 Feb 2023 16:22:30 GMT
Title: Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension
Authors: Henry Kvinge, Davis Brown, Charles Godfrey
Abstract summary: We take a first step towards understanding basic geometric properties induced by prompts in Stable Diffusion. We find that choice of prompt has a substantial impact on the intrinsic dimension of representations at both layers of the model. Our evidence suggests that intrinsic dimension could be a useful tool for future studies of the impact of different prompts on text-to-image models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompting has become an important mechanism by which users can more effectively interact with many flavors of foundation model. Indeed, the last several years have shown that well-honed prompts can sometimes unlock emergent capabilities within such models. While there has been a substantial amount of empirical exploration of prompting within the community, relatively few works have studied prompting at a mathematical level. In this work we aim to take a first step towards understanding basic geometric properties induced by prompts in Stable Diffusion, focusing on the intrinsic dimension of internal representations within the model. We find that choice of prompt has a substantial impact on the intrinsic dimension of representations at both layers of the model which we explored, but that the nature of this impact depends on the layer being considered. For example, in certain bottleneck layers of the model, intrinsic dimension of representations is correlated with prompt perplexity (measured using a surrogate model), while this correlation is not apparent in the latent layers. Our evidence suggests that intrinsic dimension could be a useful tool for future studies of the impact of different prompts on text-to-image models.

Related papers

When Does Pruning Benefit Vision Representations? [6.306016476757605]
Pruning is widely used to reduce the complexity of deep learning models, but its effects on interpretability and representation learning remain poorly understood.<n>This paper investigates how pruning influences vision models across three key dimensions: (i) interpretability, (ii) unsupervised object discovery, and (iii) alignment with human perception.
arXiv Detail & Related papers (2025-07-02T13:57:49Z)
Connecting Neural Models Latent Geometries with Relative Geodesic Representations [21.71782603770616]
We show that when a latent structure is shared between distinct latent spaces, relative distances between representations can be preserved, up to distortions.<n>We assume that distinct neural models parametrize approximately the same underlying manifold, and introduce a representation based on the pullback metric.<n>We validate our method on model stitching and retrieval tasks, covering autoencoders and vision foundation discriminative models.
arXiv Detail & Related papers (2025-06-02T12:34:55Z)
FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection [19.308304984645684]
We introduce a novel model that can generate diverse grasps for a multi-fingered hand. The proposed idea gains superior performance and higher run-time efficiency against strong baselines. We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.
arXiv Detail & Related papers (2024-07-21T13:33:08Z)
Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models. Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z)
Intriguing Equivalence Structures of the Embedding Space of Vision Transformers [1.7418480517632609]
Pre-trained large foundation models play a central role in the recent surge of artificial intelligence. Due to their inherent complexity, these models are not well understood. We show via analyses and systematic experiments that the representation space consists of large piecewise linear subspaces.
arXiv Detail & Related papers (2024-01-28T04:59:51Z)
Implicit Modeling of Non-rigid Objects with Cross-Category Signals [28.956412015920936]
MODIF is a multi-object deep implicit function that jointly learns the deformation fields and instance-specific latent codes for multiple objects at once. We show that MODIF can proficiently learn the shape representation of each organ and their relations to others, to the point that shapes missing from unseen instances can be consistently recovered.
arXiv Detail & Related papers (2023-12-15T22:34:17Z)
On the Embedding Collapse when Scaling up Recommendation Models [53.66285358088788]
We identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. We propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity.
arXiv Detail & Related papers (2023-10-06T17:50:38Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning. We consider a setting where the pretraining corpus consists of multitask demonstrations. We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z)
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis [20.316056261749946]
We propose an end-to-end vision and language model incorporating explicit knowledge graphs. We also introduce an interactive out-of-distribution layer using implicit network operator. In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval.
arXiv Detail & Related papers (2023-02-11T05:46:21Z)
Disentangling Shape and Pose for Object-Centric Deep Active Inference Models [4.298360054690217]
We consider the problem of 3D object representation, and focus on different instances of the ShapeNet dataset. We propose a model that factorizes object shape, pose and category, while still learning a representation for each factor using a deep neural network.
arXiv Detail & Related papers (2022-09-16T12:53:49Z)
Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models. We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z)
Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets. This contribution should be regarded as a systematic approach to represent structural causal models by credal networks. Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z)
Causal Discovery in Physical Systems from Videos [123.79211190669821]
Causal discovery is at the core of human cognition. We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure.
arXiv Detail & Related papers (2020-07-01T17:29:57Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.