The Evolving Nature of Latent Spaces: From GANs to Diffusion
- URL: http://arxiv.org/abs/2510.17383v2
- Date: Mon, 10 Nov 2025 10:23:35 GMT
- Title: The Evolving Nature of Latent Spaces: From GANs to Diffusion
- Authors: Ludovica Schaerf,
- Abstract summary: We show how diffusion models fragment the burden of representation and thereby challenge assumptions of unified internal space.<n>We argue for a reorientation of how generative AI is understood: not as a direct synthesis of content, but as an emergent configuration of specialized processes.
- Score: 2.9612444540570113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper examines the evolving nature of internal representations in generative visual models, focusing on the conceptual and technical shift from GANs and VAEs to diffusion-based architectures. Drawing on Beatrice Fazi's account of synthesis as the amalgamation of distributed representations, we propose a distinction between "synthesis in a strict sense", where a compact latent space wholly determines the generative process, and "synthesis in a broad sense," which characterizes models whose representational labor is distributed across layers. Through close readings of model architectures and a targeted experimental setup that intervenes in layerwise representations, we show how diffusion models fragment the burden of representation and thereby challenge assumptions of unified internal space. By situating these findings within media theoretical frameworks and critically engaging with metaphors such as the latent space and the Platonic Representation Hypothesis, we argue for a reorientation of how generative AI is understood: not as a direct synthesis of content, but as an emergent configuration of specialized processes.
Related papers
- Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models [77.98801218316505]
Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning.<n>We investigate the internal processing of LLMs during in-context concept inference.
arXiv Detail & Related papers (2026-02-08T03:14:39Z) - Guided and Unguided Conditional Diffusion Mechanisms for Structured and Semantically-Aware 3D Point Cloud Generation [0.0]
Generating realistic 3D point clouds is a fundamental problem in computer vision with applications in remote sensing, robotics, and digital object modeling.<n>We propose a diffusion-based framework that embeds per-point semantic conditioning directly within generation.<n>This design produces point clouds that are both structurally coherent and segmentation-aware, with object parts explicitly represented during synthesis.
arXiv Detail & Related papers (2025-09-21T19:19:36Z) - Escaping Plato's Cave: JAM for Aligning Independently Trained Vision and Language Models [30.07172193932125]
We show that the Joint Autoencoder Modulator (JAM) induces alignment even across independently trained representations.<n>Our findings offer theoretical insight into the structure of shared semantics and practical guidance for transforming generalist unimodal foundations into specialist multimodal models.
arXiv Detail & Related papers (2025-07-01T21:43:50Z) - The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology [4.280045926995889]
This study focuses on how adversarial inputs systematically affect the internal representation spaces of Large Language Models.<n>By quantifying the shape of activations and neuronal information flow, our architecture-agnostic framework reveals fundamental invariants of representational change.
arXiv Detail & Related papers (2025-05-26T18:31:49Z) - Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives [55.22101595974193]
Recently, continuous representation methods emerge as novel paradigms that characterize the intrinsic structures of real-world data.<n>This review focuses on three aspects: (i) Continuous representation method designs such as basis function representation, statistical modeling, tensor function decomposition, and implicit neural representation; (ii) Theoretical foundations of continuous representations such as approximation error analysis, convergence property, and implicit regularization; and (iii) Real-world applications of continuous representations derived from computer vision, graphics, bioinformatics, and remote sensing.
arXiv Detail & Related papers (2025-05-21T07:50:19Z) - Can Diffusion Models Disentangle? A Theoretical Perspective [37.21661224725838]
This paper presents a novel theoretical framework for understanding how diffusion models can learn disentangled representations.<n>We establish identifiability conditions for general disentangled latent variable models, analyze training dynamics, and derive sample complexity bounds for disentangled latent subspace models.
arXiv Detail & Related papers (2025-03-31T20:46:18Z) - Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement [58.9768112704998]
Disentangled representation learning strives to extract the intrinsic factors within observed data.
We introduce a new perspective and framework, demonstrating that diffusion models with cross-attention can serve as a powerful inductive bias.
This is the first work to reveal the potent disentanglement capability of diffusion models with cross-attention, requiring no complex designs.
arXiv Detail & Related papers (2024-02-15T05:07:54Z) - Discrete, compositional, and symbolic representations through attractor dynamics [51.20712945239422]
We introduce a novel neural systems model that integrates attractor dynamics with symbolic representations to model cognitive processes akin to the probabilistic language of thought (PLoT)
Our model segments the continuous representational space into discrete basins, with attractor states corresponding to symbolic sequences, that reflect the semanticity and compositionality characteristic of symbolic systems through unsupervised learning, rather than relying on pre-defined primitives.
This approach establishes a unified framework that integrates both symbolic and sub-symbolic processing through neural dynamics, a neuroplausible substrate with proven expressivity in AI, offering a more comprehensive model that mirrors the complex duality of cognitive operations
arXiv Detail & Related papers (2023-10-03T05:40:56Z) - Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts.
It hypothesizes that for objects with similar appearance, they share similar representation.
Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z) - Generalization Properties of Optimal Transport GANs with Latent
Distribution Learning [52.25145141639159]
We study how the interplay between the latent distribution and the complexity of the pushforward map affects performance.
Motivated by our analysis, we advocate learning the latent distribution as well as the pushforward map within the GAN paradigm.
arXiv Detail & Related papers (2020-07-29T07:31:33Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.