Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
- URL: http://arxiv.org/abs/2402.15332v2
- Date: Thu, 6 Jun 2024 00:58:55 GMT
- Title: Position: Categorical Deep Learning is an Algebraic Theory of All Architectures
- Authors: Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar Veličković,
- Abstract summary: We present our position on the quest for a general-purpose framework for specifying and studying deep learning architectures.
We propose to apply category theory as a single theory elegantly subsuming both of these flavours of neural network design.
We show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks.
- Score: 1.772996392520906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the universal algebra of monads valued in a 2-category of parametric maps -- as a single theory elegantly subsuming both of these flavours of neural network design. To defend our position, we show how this theory recovers constraints induced by geometric deep learning, as well as implementations of many architectures drawn from the diverse landscape of neural networks, such as RNNs. We also illustrate how the theory naturally encodes many standard constructs in computer science and automata theory.
Related papers
- Towards a Categorical Foundation of Deep Learning: A Survey [0.0]
This thesis is a survey that covers some recent work attempting to study machine learning categorically.
acting as a lingua franca of mathematics and science, category theory might be able to give a unifying structure to the field of machine learning.
arXiv Detail & Related papers (2024-10-07T13:11:16Z) - Fundamental Components of Deep Learning: A category-theoretic approach [0.0]
This thesis develops a novel mathematical foundation for deep learning based on the language of category theory.
We also systematise many existing approaches, placing many existing constructions and concepts under the same umbrella.
arXiv Detail & Related papers (2024-03-13T01:29:40Z) - Hierarchical Invariance for Robust and Interpretable Vision Tasks at Larger Scales [54.78115855552886]
We show how to construct over-complete invariants with a Convolutional Neural Networks (CNN)-like hierarchical architecture.
With the over-completeness, discriminative features w.r.t. the task can be adaptively formed in a Neural Architecture Search (NAS)-like manner.
For robust and interpretable vision tasks at larger scales, hierarchical invariant representation can be considered as an effective alternative to traditional CNN and invariants.
arXiv Detail & Related papers (2024-02-23T16:50:07Z) - A Structural Approach to the Design of Domain Specific Neural Network
Architectures [0.0]
This thesis aims to provide a theoretical evaluation of geometric deep learning.
It compiles theoretical results that characterize the properties of invariant neural networks with respect to learning performance.
arXiv Detail & Related papers (2023-01-23T11:50:57Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Unified Field Theory for Deep and Recurrent Neural Networks [56.735884560668985]
We present a unified and systematic derivation of the mean-field theory for both recurrent and deep networks.
We find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks.
Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$.
arXiv Detail & Related papers (2021-12-10T15:06:11Z) - Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges [50.22269760171131]
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods.
This text is concerned with exposing pre-defined regularities through unified geometric principles.
It provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers.
arXiv Detail & Related papers (2021-04-27T21:09:51Z) - The Autodidactic Universe [0.8795040582681388]
We present an approach to cosmology in which the Universe learns its own physical laws.
We discover maps that put each of these matrix models in correspondence with both a gauge/gravity theory and a mathematical model of a learning machine.
We discuss in detail what it means to say that learning takes place in autodidactic systems, where there is no supervision.
arXiv Detail & Related papers (2021-03-29T02:25:02Z) - Recent advances in deep learning theory [104.01582662336256]
This paper reviews and organizes the recent advances in deep learning theory.
The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
arXiv Detail & Related papers (2020-12-20T14:16:41Z) - Understanding Deep Architectures with Reasoning Layer [60.90906477693774]
We show that properties of the algorithm layers, such as convergence, stability, and sensitivity, are intimately related to the approximation and generalization abilities of the end-to-end model.
Our theory can provide useful guidelines for designing deep architectures with reasoning layers.
arXiv Detail & Related papers (2020-06-24T00:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.