Parametric machines: a fresh approach to architecture search
        - URL: http://arxiv.org/abs/2007.02777v2
- Date: Wed, 8 Jul 2020 16:24:55 GMT
- Title: Parametric machines: a fresh approach to architecture search
- Authors: Pietro Vertechi, Patrizio Frosini, Mattia G. Bergomi
- Abstract summary: We show how simple machines can be combined into more complex ones.
We explore finite- and infinite-depth machines, which generalize neural networks and neural ordinary differential equations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Using tools from category theory, we provide a framework where artificial
neural networks, and their architectures, can be formally described. We first
define the notion of machine in a general categorical context, and show how
simple machines can be combined into more complex ones. We explore finite- and
infinite-depth machines, which generalize neural networks and neural ordinary
differential equations. Borrowing ideas from functional analysis and kernel
methods, we build complete, normed, infinite-dimensional spaces of machines,
and discuss how to find optimal architectures and parameters -- within those
spaces -- to solve a given computational problem. In our numerical experiments,
these kernel-inspired networks can outperform classical neural networks when
the training dataset is small.
 
      
        Related papers
        - Principled Approaches for Extending Neural Architectures to Function   Spaces for Operator Learning [78.88684753303794]
 Deep learning has predominantly advanced through applications in computer vision and natural language processing.<n>Neural operators are a principled way to generalize neural networks to mappings between function spaces.<n>This paper identifies and distills the key principles for constructing practical implementations of mappings between infinite-dimensional function spaces.
 arXiv  Detail & Related papers  (2025-06-12T17:59:31Z)
- NNTile: a machine learning framework capable of training extremely large   GPT language models on a single node [83.9328245724548]
 NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units.
It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
 arXiv  Detail & Related papers  (2025-04-17T16:22:32Z)
- Analog Alchemy: Neural Computation with In-Memory Inference, Learning   and Routing [0.08965418284317034]
 I explore an alternative way with memristive devices for neural computation, where the unique physical dynamics of the devices are used for inference, learning and routing.
I will provide hardware evidence of adaptability of local learning to memristive substrates, new material stacks and circuit blocks that aid in solving the credit assignment problem and efficient routing between analog crossbars for scalable architectures.
 arXiv  Detail & Related papers  (2024-12-30T10:35:03Z)
- Structure of Artificial Neural Networks -- Empirical Investigations [0.0]
 Within one decade, Deep Learning overtook the dominating solution methods of countless problems of artificial intelligence.
With a formal definition for structures of neural networks, neural architecture search problems and solution methods can be formulated under a common framework.
Does structure make a difference or can it be chosen arbitrarily?
 arXiv  Detail & Related papers  (2024-10-12T16:13:28Z)
- Mechanistic Neural Networks for Scientific Machine Learning [58.99592521721158]
 We present Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences.
It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations.
Central to our approach is a novel Relaxed Linear Programming solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs.
 arXiv  Detail & Related papers  (2024-02-20T15:23:24Z)
- Higher-order topological kernels via quantum computation [68.8204255655161]
 Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data.
We propose a quantum approach to defining Betti kernels, which is based on constructing Betti curves with increasing order.
 arXiv  Detail & Related papers  (2023-07-14T14:48:52Z)
- Building artificial neural circuits for domain-general cognition: a
  primer on brain-inspired systems-level architecture [0.0]
 We provide an overview of the hallmarks endowing biological neural networks with the functionality needed for flexible cognition.
As machine learning models become more complex, these principles may provide valuable directions in an otherwise vast space of possible architectures.
 arXiv  Detail & Related papers  (2023-03-21T18:36:17Z)
- Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
 In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque.
We construct a class of surrogate models for neural networks using Gaussian processes.
We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
 arXiv  Detail & Related papers  (2022-08-11T20:17:02Z)
- Machines of finite depth: towards a formalization of neural networks [0.0]
 We provide a unifying framework where artificial neural networks and their architectures can be formally described as particular cases of a general mathematical construction--machines of finite depth.
We prove this statement theoretically and practically, via a unified implementation that generalizes several classical architectures--dense, convolutional, and recurrent neural networks with a rich shortcut structure--and their respective backpropagation rules.
 arXiv  Detail & Related papers  (2022-04-27T09:17:15Z)
- Inducing Gaussian Process Networks [80.40892394020797]
 We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
 arXiv  Detail & Related papers  (2022-04-21T05:27:09Z)
- Quasi-orthogonality and intrinsic dimensions as measures of learning and
  generalisation [55.80128181112308]
 We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
 arXiv  Detail & Related papers  (2022-03-30T21:47:32Z)
- Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges [50.22269760171131]
 The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods.
This text is concerned with exposing pre-defined regularities through unified geometric principles.
It provides a common mathematical framework to study the most successful neural network architectures, such as CNNs, RNNs, GNNs, and Transformers.
 arXiv  Detail & Related papers  (2021-04-27T21:09:51Z)
- Spiking Neural Networks Hardware Implementations and Challenges: a
  Survey [53.429871539789445]
 Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles.
We present the state of the art of hardware implementations of spiking neural networks.
We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
 arXiv  Detail & Related papers  (2020-05-04T13:24:00Z)
- On the computational power and complexity of Spiking Neural Networks [0.0]
 We introduce spiking neural networks as a machine model where---in contrast to the familiar Turing machine---information and the manipulation thereof are co-located in the machine.
We introduce canonical problems, define hierarchies of complexity classes and provide some first completeness results.
 arXiv  Detail & Related papers  (2020-01-23T10:40:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.