Related papers: Composing Linear Layers from Irreducibles

Composing Linear Layers from Irreducibles

URL: http://arxiv.org/abs/2507.11688v2
Date: Sun, 20 Jul 2025 03:19:28 GMT
Title: Composing Linear Layers from Irreducibles
Authors: Travis Pence, Daisuke Yamada, Vikas Singh,
Abstract summary: We show that linear layers can be expressed as compositions of bivectors.<n>We introduce a differentiable algorithm that decomposes them into products of rotors.
Score: 21.94216765677867
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contemporary large models often exhibit behaviors suggesting the presence of low-level primitives that compose into modules with richer functionality, but these fundamental building blocks remain poorly understood. We investigate this compositional structure in linear layers by asking: can we identify/synthesize linear transformations from a minimal set of geometric primitives? Using Clifford algebra, we show that linear layers can be expressed as compositions of bivectors -- geometric objects encoding oriented planes -- and introduce a differentiable algorithm that decomposes them into products of rotors. This construction uses only O(log^2 d) parameters, versus O(d^2) required by dense matrices. Applied to the key, query, and value projections in LLM attention layers, our rotor-based layers match the performance of strong baselines such as block-Hadamard and low-rank approximations. Our findings provide an algebraic perspective on how these geometric primitives can compose into higher-level functions within deep models.

Related papers

Cover Learning for Large-Scale Topology Representation [0.0]
We describe a method for learning topologically-faithful covers of geometric datasets.<n>We show that the simplicial complexes thus obtained can outperform standard topological inference approaches in terms of size.
arXiv Detail & Related papers (2025-03-12T19:10:20Z)
Point Cloud Synthesis Using Inner Product Transforms [13.608942872770855]
We develop a novel method that encodes geometrical-topological characteristics of point clouds using inner products.<n>Our encoding exhibits high quality in typical tasks like reconstruction, generation, and inference, with inference times orders of magnitude faster than existing methods.
arXiv Detail & Related papers (2024-10-09T17:19:22Z)
Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices [88.33936714942996]
We present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. We show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables. We find that Mixture-of-Experts (MoE) learns an MoE in every single linear layer of the model, including the projection in the attention blocks.
arXiv Detail & Related papers (2024-10-03T00:44:50Z)
Weight-based Decomposition: A Case for Bilinear MLPs [0.0]
Gated Linear Units (GLUs) have become a common building block in modern foundation models. Bilinear layers drop the non-linearity in the "gate" but still have comparable performance to other GLUs. We develop a method to decompose the bilinear tensor into a set of interacting eigenvectors.
arXiv Detail & Related papers (2024-06-06T10:46:51Z)
A technical note on bilinear layers for interpretability [0.0]
Bilinear layers are a type of layer that are mathematically much easier to analyze. We can integrate this expression for bilinear layers into a mathematical framework for transformer circuits.
arXiv Detail & Related papers (2023-05-05T11:56:26Z)
Linear Spaces of Meanings: Compositional Structures in Vision-Language Models [110.00434385712786]
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs) We first present a framework for understanding compositional structures from a geometric perspective. We then explain what these structures entail probabilistically in the case of VLM embeddings, providing intuitions for why they arise in practice.
arXiv Detail & Related papers (2023-02-28T08:11:56Z)
Geometric Clifford Algebra Networks [53.456211342585824]
We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. GCANs are based on symmetry group transformations using geometric (Clifford) algebras.
arXiv Detail & Related papers (2023-02-13T18:48:33Z)
Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations. We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z)
Geometry of Linear Convolutional Networks [7.990816079551592]
We study the family of functions represented by a linear convolutional neural network (LCN) We study the optimization of an objective function over an LCN, analyzing critical points in function space and in gradient space. Overall, our theory predicts that the optimized parameters of an LCN will often correspond to repeated filters across layers.
arXiv Detail & Related papers (2021-08-03T14:42:18Z)
Bilinear Classes: A Structural Framework for Provable Generalization in RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning. The framework incorporates nearly all existing models in which a sample complexity is achievable. Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z)
Learning from Protein Structure with Geometric Vector Perceptrons [6.5360079597553025]
We introduce geometric vector perceptrons, which extend standard dense layers to operate on collections of Euclidean vectors. We demonstrate our approach on two important problems in learning from protein structure: model quality assessment and computational protein design.
arXiv Detail & Related papers (2020-09-03T01:54:25Z)
Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks. To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data. We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.