Related papers: Neural Attentive Circuits

Neural Attentive Circuits

URL: http://arxiv.org/abs/2210.08031v1
Date: Fri, 14 Oct 2022 18:00:07 GMT
Title: Neural Attentive Circuits
Authors: Nasim Rahaman and Martin Weiss and Francesco Locatello and Chris Pal and Yoshua Bengio and Bernhard Sch\"olkopf and Erran Li and Nicolas Ballas
Abstract summary: We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs achieve an 8x speedup at inference time while losing less than 3% performance.
Score: 93.95502541529115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature.

Related papers

Data Uniformity Improves Training Efficiency and More, with a Convergence Framework Beyond the NTK Regime [9.749891245059596]
We demonstrate that selecting more uniformly distributed data can improve training efficiency while enhancing performance.<n>Specifically, we establish that more uniform (less biased) distribution leads to a larger minimum pairwise distance between data points.<n>We theoretically show that the approximation error of neural networks decreases as $h_min$ increases.
arXiv Detail & Related papers (2025-06-30T17:58:30Z)
Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN [20.380620709345898]
Early-exiting dynamic neural networks (EDNN) allow a model to make some of its predictions from intermediate layers (i.e., early-exit) Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations. We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities.
arXiv Detail & Related papers (2023-10-13T14:56:38Z)
Adaptive Growth: Real-time CNN Layer Expansion [0.0]
This research presents a new algorithm that allows the convolutional layer of a Convolutional Neural Network (CNN) to dynamically evolve based on data input. Instead of a rigid architecture, our approach iteratively introduces kernels to the convolutional layer, gauging its real-time response to varying data. Remarkably, our unsupervised method has outstripped its supervised counterparts across diverse datasets.
arXiv Detail & Related papers (2023-09-06T14:43:58Z)
Efficient Model Adaptation for Continual Learning at the Edge [15.334881190102895]
Most machine learning (ML) systems assume stationary and matching data distributions during training and deployment. Data distributions often shift over time due to changes in environmental factors, sensor characteristics, and task-of-interest. This paper presents theAdaptor-Reconfigurator (EAR) framework for efficient continual learning under domain shifts.
arXiv Detail & Related papers (2023-08-03T23:55:17Z)
Modular Neural Network Approaches for Surgical Image Recognition [0.0]
We introduce and evaluate different architectures of modular learning for Dorsal Capsulo-Scapholunate Septum (DCSS) instability classification. Our experiments have shown that modular learning improves performances compared to non-modular systems. In the second part, we present our approach for data labeling and segmentation with self-training applied on shoulder arthroscopy images.
arXiv Detail & Related papers (2023-07-17T22:28:16Z)
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically. Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data. The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet. Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs) Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.