Related papers: Unified Sparse-Matrix Representations for Diverse Neural Architectures

Unified Sparse-Matrix Representations for Diverse Neural Architectures

URL: http://arxiv.org/abs/2506.01966v3
Date: Tue, 22 Jul 2025 20:07:02 GMT
Title: Unified Sparse-Matrix Representations for Diverse Neural Architectures
Authors: Yuzhou Zhu,
Abstract summary: We introduce a unified matrix-order framework that casts convolutional, recurrent and self-attention operations as sparse matrix multiplications.<n>This work establishes a mathematically rigorous substrate for diverse neural architectures and opens avenues for principled, hardware-aware network design.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks employ specialized architectures for vision, sequential and language tasks, yet this proliferation obscures their underlying commonalities. We introduce a unified matrix-order framework that casts convolutional, recurrent and self-attention operations as sparse matrix multiplications. Convolution is realized via an upper-triangular weight matrix performing first-order transformations; recurrence emerges from a lower-triangular matrix encoding stepwise updates; attention arises naturally as a third-order tensor factorization. We prove algebraic isomorphism with standard CNN, RNN and Transformer layers under mild assumptions. Empirical evaluations on image classification (MNIST, CIFAR-10/100, Tiny ImageNet), time-series forecasting (ETTh1, Electricity Load Diagrams) and language modeling/classification (AG News, WikiText-2, Penn Treebank) confirm that sparse-matrix formulations match or exceed native model performance while converging in comparable or fewer epochs. By reducing architecture design to sparse pattern selection, our matrix perspective aligns with GPU parallelism and leverages mature algebraic optimization tools. This work establishes a mathematically rigorous substrate for diverse neural architectures and opens avenues for principled, hardware-aware network design.

Related papers

Weight Conditioning for Smooth Optimization of Neural Networks [28.243353447978837]
We introduce a novel normalization technique for neural network weight matrices, which we term weight conditioning. This approach aims to narrow the gap between the smallest and largest singular values of the weight matrices, resulting in better-conditioned matrices. Our findings indicate that our normalization method is not only competitive but also outperforms existing weight normalization techniques from the literature.
arXiv Detail & Related papers (2024-09-05T11:10:34Z)
Optimal Matrix-Mimetic Tensor Algebras via Variable Projection [0.0]
Matrix mimeticity arises from interpreting tensors as operators that can be multiplied, factorized, and analyzed analogous to matrices. We learn optimal linear mappings and corresponding tensor representations without relying on prior knowledge of the data. We provide original theory of uniqueness of the transformation and convergence analysis of our variable-projection-based algorithm.
arXiv Detail & Related papers (2024-06-11T04:52:23Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs) Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators. Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z)
NeoNeXt: Novel neural network operator and architecture based on the patch-wise matrix multiplications [0.0]
We propose a novel foundation operation - NeoCell - which learns matrix patterns and performs patchwise matrix multiplications with the input data. The main advantages of the proposed operator are (1) simple implementation without need in operations like im2col, (2) low computational complexity (especially for large matrices) and (3) simple and flexible implementation of up-/down-sampling. We validate NeoNeXt family of models based on this operation on ImageNet-1K classification task and show that they achieve competitive quality.
arXiv Detail & Related papers (2024-03-17T15:51:21Z)
A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks. We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z)
Equivariant neural networks for recovery of Hadamard matrices [0.7742297876120561]
We propose a message passing neural network architecture designed to be equivariant to column and row permutations of a matrix. We illustrate its advantages over traditional architectures like multi-layer perceptrons (MLPs), convolutional neural networks (CNNs) and even Transformers.
arXiv Detail & Related papers (2022-01-31T12:07:07Z)
A Deep Generative Model for Matrix Reordering [26.86727566323601]
We develop a generative model that learns a latent space of diverse matrix reorderings of a graph. We construct an intuitive user interface from the learned latent space by creating a map of various matrix reorderings. This paper introduces a fundamentally new approach to matrix visualization of a graph, where a machine learning model learns to generate diverse matrix reorderings of a graph.
arXiv Detail & Related papers (2021-10-11T02:55:24Z)
Frame Averaging for Invariant and Equivariant Network Design [50.87023773850824]
We introduce Frame Averaging (FA), a framework for adapting known (backbone) architectures to become invariant or equivariant to new symmetry types. We show that FA-based models have maximal expressive power in a broad setting. We propose a new class of universal Graph Neural Networks (GNNs), universal Euclidean motion invariant point cloud networks, and Euclidean motion invariant Message Passing (MP) GNNs.
arXiv Detail & Related papers (2021-10-07T11:05:23Z)
X-volution: On the unification of convolution and self-attention [52.80459687846842]
We propose a multi-branch elementary module composed of both convolution and self-attention operation. The proposed X-volution achieves highly competitive visual understanding improvements.
arXiv Detail & Related papers (2021-06-04T04:32:02Z)
Learning Local Neighboring Structure for Robust 3D Shape Representation [143.15904669246697]
Representation learning for 3D meshes is important in many computer vision and graphics applications. We propose a local structure-aware anisotropic convolutional operation (LSA-Conv) Our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-21T13:40:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.