Conditional computation in neural networks: principles and research trends
- URL: http://arxiv.org/abs/2403.07965v2
- Date: Mon, 8 Jul 2024 09:21:00 GMT
- Title: Conditional computation in neural networks: principles and research trends
- Authors: Simone Scardapane, Alessandro Baiocchi, Alessio Devoto, Valerio Marsocci, Pasquale Minervini, Jary Pomponi,
- Abstract summary: This article summarizes principles and ideas from the emerging area of applying textitconditional computation methods to the design of neural networks.
In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input.
- Score: 48.14569369912931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), and sub-modules inside each layer (e.g., channels in a convolutional filter). We first provide a general formalism to describe these techniques in an uniform way. Then, we introduce three notable implementations of these principles: mixture-of-experts (MoEs) networks, token selection mechanisms, and early-exit neural networks. The paper aims to provide a tutorial-like introduction to this growing field. To this end, we analyze the benefits of these modular designs in terms of efficiency, explainability, and transfer learning, with a focus on emerging applicative areas ranging from automated scientific discovery to semantic communication.
Related papers
- Identifying Sub-networks in Neural Networks via Functionally Similar Representations [41.028797971427124]
We take a step toward automating the understanding of the network by investigating the existence of distinct sub-networks.
Our approach offers meaningful insights into the behavior of neural networks with minimal human and computational cost.
arXiv Detail & Related papers (2024-10-21T20:19:00Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Feature emergence via margin maximization: case studies in algebraic
tasks [4.401622714202886]
We show that trained neural networks employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups.
More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
arXiv Detail & Related papers (2023-11-13T18:56:33Z) - When Deep Learning Meets Polyhedral Theory: A Survey [6.899761345257773]
In the past decade, deep became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural learning.
Meanwhile, the structure of neural networks converged back to simplerwise and linear functions.
arXiv Detail & Related papers (2023-04-29T11:46:53Z) - Simple initialization and parametrization of sinusoidal networks via
their kernel bandwidth [92.25666446274188]
sinusoidal neural networks with activations have been proposed as an alternative to networks with traditional activation functions.
We first propose a simplified version of such sinusoidal neural networks, which allows both for easier practical implementation and simpler theoretical analysis.
We then analyze the behavior of these networks from the neural tangent kernel perspective and demonstrate that their kernel approximates a low-pass filter with an adjustable bandwidth.
arXiv Detail & Related papers (2022-11-26T07:41:48Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Analyzing Representations inside Convolutional Neural Networks [8.803054559188048]
We propose a framework to categorize the concepts a network learns based on the way it clusters a set of input examples.
This framework is unsupervised and can work without any labels for input features.
We extensively evaluate the proposed method and demonstrate that it produces human-understandable and coherent concepts.
arXiv Detail & Related papers (2020-12-23T07:10:17Z) - A Practical Tutorial on Graph Neural Networks [49.919443059032226]
Graph neural networks (GNNs) have recently grown in popularity in the field of artificial intelligence (AI)
This tutorial exposes the power and novelty of GNNs to AI practitioners.
arXiv Detail & Related papers (2020-10-11T12:36:17Z) - Spiking Neural Networks Hardware Implementations and Challenges: a
Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles.
We present the state of the art of hardware implementations of spiking neural networks.
We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z) - Emergence of Network Motifs in Deep Neural Networks [0.35911228556176483]
We show that network science tools can be successfully applied to the study of artificial neural networks.
In particular, we study the emergence of network motifs in multi-layer perceptrons.
arXiv Detail & Related papers (2019-12-27T17:05:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.