Structural Disentanglement in Bilinear MLPs via Architectural Inductive Bias
- URL: http://arxiv.org/abs/2602.05635v1
- Date: Thu, 05 Feb 2026 13:14:01 GMT
- Title: Structural Disentanglement in Bilinear MLPs via Architectural Inductive Bias
- Authors: Ojasva Nema, Kaustubh Sharma, Aditya Chauhan, Parikshit Pareek,
- Abstract summary: We argue that failures arise from how models structure their internal representations during training.<n>We show analytically that bilinear parameterizations possess a non-mixing' property under gradient flow conditions.<n>Unlike pointwise nonlinear networks, multiplicative architectures are able to recover true operators aligned with the underlying algebraic structure.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Selective unlearning and long-horizon extrapolation remain fragile in modern neural networks, even when tasks have underlying algebraic structure. In this work, we argue that these failures arise not solely from optimization or unlearning algorithms, but from how models structure their internal representations during training. We explore if having explicit multiplicative interactions as an architectural inductive bias helps in structural disentanglement, through Bilinear MLPs. We show analytically that bilinear parameterizations possess a `non-mixing' property under gradient flow conditions, where functional components separate into orthogonal subspace representations. This provides a mathematical foundation for surgical model modification. We validate this hypothesis through a series of controlled experiments spanning modular arithmetic, cyclic reasoning, Lie group dynamics, and targeted unlearning benchmarks. Unlike pointwise nonlinear networks, multiplicative architectures are able to recover true operators aligned with the underlying algebraic structure. Our results suggest that model editability and generalization are constrained by representational structure, and that architectural inductive bias plays a central role in enabling reliable unlearning.
Related papers
- Coalgebras for categorical deep learning: Representability and universal approximation [0.0]
We develop a coalgebraic foundation for equivariant representation in deep learning.<n>We establish a universal approximation theorem for equivariant maps in a generalized setting.<n>This work provides a categorical bridge between the abstract specification of invariant behavior and its concrete realization in neural architectures.
arXiv Detail & Related papers (2026-03-03T18:18:50Z) - Geometric Laplace Neural Operator [12.869633759181417]
We propose a generalized operator learning framework based on a pole-residue decomposition enriched with exponential basis functions.<n>We introduce the Geometric Laplace Neural Operator (GLNO), which embeds the Laplace spectral representation into the eigen-basis of the Laplace-Beltrami operator.<n>We further design a grid-invariant network architecture (GLNONet) that realizes GLNO in practice.
arXiv Detail & Related papers (2025-12-18T11:07:41Z) - Information-Theoretic Bounds and Task-Centric Learning Complexity for Real-World Dynamic Nonlinear Systems [0.6875312133832079]
Dynamic nonlinear systems exhibit distortions arising from coupled static and dynamic effects.<n>This paper presents a theoretical framework grounded in structured decomposition, variance analysis, and task-centric complexity bounds.
arXiv Detail & Related papers (2025-09-08T12:08:02Z) - Cross-Model Semantics in Representation Learning [1.2064681974642195]
We show that structural regularities induce representational geometry that is more stable under architectural variation.<n>This suggests that certain forms of inductive bias not only support generalization within a model, but also improve the interoperability of learned features across models.
arXiv Detail & Related papers (2025-08-05T16:57:24Z) - Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures [0.0]
We develop a category-theoretic framework focusing on the linear components of self-attention.<n>We show that the query, key, and value maps naturally define a parametric 1-morphism in the 2-category $mathbfPara(Vect)$.<n> stacking multiple self-attention layers corresponds to constructing the free monad on this endofunctor.
arXiv Detail & Related papers (2025-01-06T11:14:18Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet)
We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Convolutional Filtering and Neural Networks with Non Commutative
Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks.
We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z) - LieTransformer: Equivariant self-attention for Lie Groups [49.9625160479096]
Group equivariant neural networks are used as building blocks of group invariant neural networks.
We extend the scope of the literature to self-attention, that is emerging as a prominent building block of deep learning models.
We propose the LieTransformer, an architecture composed of LieSelfAttention layers that are equivariant to arbitrary Lie groups and their discrete subgroups.
arXiv Detail & Related papers (2020-12-20T11:02:49Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.