Cross-Model Semantics in Representation Learning
- URL: http://arxiv.org/abs/2508.03649v1
- Date: Tue, 05 Aug 2025 16:57:24 GMT
- Title: Cross-Model Semantics in Representation Learning
- Authors: Saleh Nikooroo, Thomas Engel,
- Abstract summary: We show that structural regularities induce representational geometry that is more stable under architectural variation.<n>This suggests that certain forms of inductive bias not only support generalization within a model, but also improve the interoperability of learned features across models.
- Score: 1.2064681974642195
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The internal representations learned by deep networks are often sensitive to architecture-specific choices, raising questions about the stability, alignment, and transferability of learned structure across models. In this paper, we investigate how structural constraints--such as linear shaping operators and corrective paths--affect the compatibility of internal representations across different architectures. Building on the insights from prior studies on structured transformations and convergence, we develop a framework for measuring and analyzing representational alignment across networks with distinct but related architectural priors. Through a combination of theoretical insights, empirical probes, and controlled transfer experiments, we demonstrate that structural regularities induce representational geometry that is more stable under architectural variation. This suggests that certain forms of inductive bias not only support generalization within a model, but also improve the interoperability of learned features across models. We conclude with a discussion on the implications of representational transferability for model distillation, modular learning, and the principled design of robust learning systems.
Related papers
- Understanding Learning Dynamics Through Structured Representations [1.2064681974642195]
This paper investigates how internal structural choices shape the behavior of learning systems.<n>We analyze how these structures influence gradient flow, spectral sensitivity, and fixed-point behavior.<n>Rather than prescribing fixed templates, we emphasize principles of tractable design that can steer learning behavior in interpretable ways.
arXiv Detail & Related papers (2025-08-04T07:15:57Z) - Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z) - Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation [3.8073142980733]
This thesis introduces quantitative methods for identifying systematic structure in a mapping between spaces.<n>I identify structural primitives present in a mapping, along with information theoretics of each.<n>I also introduce a novel, performant, approach to estimating the entropy of vector space, that allows this analysis to be applied to models ranging in size from 1 million to 12 billion parameters.
arXiv Detail & Related papers (2025-05-29T19:27:50Z) - Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning [0.0]
We introduce a new structure for compositional embeddings built on directional non-commutative monoidal operators.<n>Our construction defines a distinct composition operator circ_i for each axis i, ensuring associative combination along each axis without imposing global commutativity.<n>All axis-specific operators commute with one another, enforcing a global interchange law that enables consistent crossaxis compositions.
arXiv Detail & Related papers (2025-05-21T13:27:14Z) - Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z) - A Survey of Model Architectures in Information Retrieval [64.75808744228067]
We focus on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation.<n>We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent large language models (LLMs)<n>We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.
arXiv Detail & Related papers (2025-02-20T18:42:58Z) - Interpretable deformable image registration: A geometric deep learning perspective [9.13809412085203]
We present a theoretical foundation for designing an interpretable registration framework.<n>We formulate an end-to-end process that refines transformations in a coarse-to-fine fashion.<n>We conclude by showing significant improvement in performance metrics over state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-17T19:47:10Z) - Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient [0.49478969093606673]
We introduce refined variants of the Local Learning Coefficient (LLC), a measure of model complexity grounded in singular learning theory.
We study the development of internal structure in transformer language models during training.
arXiv Detail & Related papers (2024-10-03T20:51:02Z) - Compositional Structures in Neural Embedding and Interaction Decompositions [101.40245125955306]
We describe a basic correspondence between linear algebraic structures within vector embeddings in artificial neural networks.
We introduce a characterization of compositional structures in terms of "interaction decompositions"
We establish necessary and sufficient conditions for the presence of such structures within the representations of a model.
arXiv Detail & Related papers (2024-07-12T02:39:50Z) - The Buffer Mechanism for Multi-Step Information Reasoning in Language Models [52.77133661679439]
Investigating internal reasoning mechanisms of large language models can help us design better model architectures and training strategies.
In this study, we constructed a symbolic dataset to investigate the mechanisms by which Transformer models employ vertical thinking strategy.
We proposed a random matrix-based algorithm to enhance the model's reasoning ability, resulting in a 75% reduction in the training time required for the GPT-2 model.
arXiv Detail & Related papers (2024-05-24T07:41:26Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Adaptive Interaction Modeling via Graph Operations Search [109.45125932109454]
We automate the process of structures design to learn adaptive structures for interaction modeling.
We experimentally demonstrate that our architecture search framework learns to construct adaptive interaction modeling structures.
Our method achieves competitive performance with state-of-the-arts.
arXiv Detail & Related papers (2020-05-05T13:01:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.