Neural Attentive Circuits
- URL: http://arxiv.org/abs/2210.08031v1
- Date: Fri, 14 Oct 2022 18:00:07 GMT
- Title: Neural Attentive Circuits
- Authors: Nasim Rahaman and Martin Weiss and Francesco Locatello and Chris Pal
and Yoshua Bengio and Bernhard Sch\"olkopf and Erran Li and Nicolas Ballas
- Abstract summary: We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
- Score: 93.95502541529115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has seen the development of general purpose neural architectures
that can be trained to perform tasks across diverse data modalities. General
purpose models typically make few assumptions about the underlying
data-structure and are known to perform well in the large-data regime. At the
same time, there has been growing interest in modular neural architectures that
represent the data using sparsely interacting modules. These models can be more
robust out-of-distribution, computationally efficient, and capable of
sample-efficient adaptation to new data. However, they tend to make
domain-specific assumptions about the data, and present challenges in how
module behavior (i.e., parameterization) and connectivity (i.e., their layout)
can be jointly learned. In this work, we introduce a general purpose, yet
modular neural architecture called Neural Attentive Circuits (NACs) that
jointly learns the parameterization and a sparse connectivity of neural modules
without using domain knowledge. NACs are best understood as the combination of
two systems that are jointly trained end-to-end: one that determines the module
configuration and the other that executes it on an input. We demonstrate
qualitatively that NACs learn diverse and meaningful module configurations on
the NLVR2 dataset without additional supervision. Quantitatively, we show that
by incorporating modularity in this way, NACs improve upon a strong non-modular
baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about
10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that
NACs can achieve an 8x speedup at inference time while losing less than 3%
performance. Finally, we find NACs to yield competitive results on diverse data
modalities spanning point-cloud classification, symbolic processing and
text-classification from ASCII bytes, thereby confirming its general purpose
nature.
Related papers
- Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN [20.380620709345898]
Early-exiting dynamic neural networks (EDNN) allow a model to make some of its predictions from intermediate layers (i.e., early-exit)
Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations.
We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities.
arXiv Detail & Related papers (2023-10-13T14:56:38Z) - Adaptive Growth: Real-time CNN Layer Expansion [0.0]
This research presents a new algorithm that allows the convolutional layer of a Convolutional Neural Network (CNN) to dynamically evolve based on data input.
Instead of a rigid architecture, our approach iteratively introduces kernels to the convolutional layer, gauging its real-time response to varying data.
Remarkably, our unsupervised method has outstripped its supervised counterparts across diverse datasets.
arXiv Detail & Related papers (2023-09-06T14:43:58Z) - Efficient Model Adaptation for Continual Learning at the Edge [15.334881190102895]
Most machine learning (ML) systems assume stationary and matching data distributions during training and deployment.
Data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest.
This paper presents theAdaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts.
arXiv Detail & Related papers (2023-08-03T23:55:17Z) - Modular Neural Network Approaches for Surgical Image Recognition [0.0]
We introduce and evaluate different architectures of modular learning for Dorsal Capsulo-Scapholunate Septum (DCSS) instability classification.
Our experiments have shown that modular learning improves performances compared to non-modular systems.
In the second part, we present our approach for data labeling and segmentation with self-training applied on shoulder arthroscopy images.
arXiv Detail & Related papers (2023-07-17T22:28:16Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data.
The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.