Related papers: Visual Language Hypothesis

Visual Language Hypothesis

URL: http://arxiv.org/abs/2512.23335v2
Date: Wed, 31 Dec 2025 08:18:13 GMT
Title: Visual Language Hypothesis
Authors: Xiu Li,
Abstract summary: We study visual representation learning from a structural and topological perspective.<n>We show that approximating the quotient also places structural demands on the model architecture.
Score: 14.062822951292402
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study visual representation learning from a structural and topological perspective. We begin from a single hypothesis: that visual understanding presupposes a semantic language for vision, in which many perceptual observations correspond to a small number of discrete semantic states. Together with widely assumed premises on transferability and abstraction in representation learning, this hypothesis implies that the visual observation space must be organized in a fiber bundle like structure, where nuisance variation populates fibers and semantics correspond to a quotient base space. From this structure we derive two theoretical consequences. First, the semantic quotient X/G is not a submanifold of X and cannot be obtained through smooth deformation alone, semantic invariance requires a non homeomorphic, discriminative target for example, supervision via labels, cross-instance identification, or multimodal alignment that supplies explicit semantic equivalence. Second, we show that approximating the quotient also places structural demands on the model architecture. Semantic abstraction requires not only an external semantic target, but a representation mechanism capable of supporting topology change: an expand and snap process in which the manifold is first geometrically expanded to separate structure and then collapsed to form discrete semantic regions. We emphasize that these results are interpretive rather than prescriptive: the framework provides a topological lens that aligns with empirical regularities observed in large-scale discriminative and multimodal models, and with classical principles in statistical learning theory.

Related papers

Criteria-first, semantics-later: reproducible structure discovery in image-based sciences [0.0]
Structure is recovered by predicting or enforcing domain-specific labels.<n>This paradigm fails systematically under the conditions that make image-based science most valuable.<n>A unified framework for criteria-first structure discovery is introduced.
arXiv Detail & Related papers (2026-02-17T16:45:49Z)
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models [56.656180566692946]
We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models)<n>ThinkARM explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, verify, etc.<n>We show that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
arXiv Detail & Related papers (2025-12-23T02:44:25Z)
A Unified Linear Algebraic Framework for Physical Models and Generalized Contextuality [0.0]
We operationalize rank separation via two complementary methods provided by the linear-algebraic framework.<n>By reframing contextuality as a problem in matrix analysis, our work provides a unified structure for its systematic study.
arXiv Detail & Related papers (2025-12-10T19:00:09Z)
Interpretation as Linear Transformation: A Cognitive-Geometric Model of Belief and Meaning [0.0]
I show how belief distortion, motivational drift, counterfactual evaluation, and the limits of mutual understanding arise from purely algebraic constraints.<n>I argue that this cognitive-geometric perspective clarifies the boundaries of influence in both human and artificial systems.
arXiv Detail & Related papers (2025-12-10T17:13:01Z)
Semantic Attractors and the Emergence of Meaning: Towards a Teleological Model of AGI [0.0]
This essay develops a theoretical framework for a semantic Artificial General Intelligence (AGI) based on the notion of semantic attractors in complex meaning spaces.
arXiv Detail & Related papers (2025-08-21T19:57:52Z)
How do Transformers Learn Implicit Reasoning? [67.02072851088637]
We study how implicit multi-hop reasoning emerges by training transformers from scratch in a controlled symbolic environment.<n>We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures.
arXiv Detail & Related papers (2025-05-29T17:02:49Z)
Large Language Models as Quasi-crystals: Coherence Without Repetition in Generative Text [0.0]
essay proposes an analogy between large language models (LLMs) and quasicrystals, systems that exhibit global coherence without periodic repetition, generated through local constraints.<n> Drawing on the history of quasicrystals, it highlights an alternative mode of coherence in generative language: constraint-based organization without repetition or symbolic intent.<n>This essay aims to reframe the current discussion around large language models, not by rejecting existing methods, but by suggesting an additional axis of interpretation grounded in structure rather than semantics.
arXiv Detail & Related papers (2025-04-16T11:27:47Z)
Learning Visual-Semantic Subspace Representations [49.17165360280794]
We introduce a nuclear norm-based loss function, grounded in the same information theoretic principles that have proved effective in self-supervised learning.<n>We present a theoretical characterization of this loss, demonstrating that, in addition to promoting classity, it encodes the spectral geometry of the data within a subspace lattice.
arXiv Detail & Related papers (2024-05-25T12:51:38Z)
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation [59.138470433237615]
We introduce statistical metrics that quantify both the linguistic and visual skew of a dataset for relational learning. We show that systematically controlled metrics are strongly predictive of generalization performance. This work informs an important direction towards quality-enhancing the data diversity or balance to scaling up the absolute size.
arXiv Detail & Related papers (2024-03-25T03:18:39Z)
Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture. With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner. For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
Unifying Causal Inference and Reinforcement Learning using Higher-Order Category Theory [4.119151469153588]
We present a unified formalism for structure discovery of causal models and predictive state representation models in reinforcement learning. Specifically, we model structure discovery in both settings using simplicial objects.
arXiv Detail & Related papers (2022-09-13T19:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.