Matrix Shuffle-Exchange Networks for Hard 2D Tasks
- URL: http://arxiv.org/abs/2006.15892v2
- Date: Mon, 5 Oct 2020 08:55:35 GMT
- Title: Matrix Shuffle-Exchange Networks for Hard 2D Tasks
- Authors: Em\=ils Ozoli\c{n}\v{s}, K\=arlis Freivalds, Agris \v{S}ostaks
- Abstract summary: Matrix Shuffle-Exchange network can efficiently exploit long-range dependencies in 2D data.
It has comparable speed to a convolutional neural network.
- Score: 2.4493299476776778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks have become the main tools for processing
two-dimensional data. They work well for images, yet convolutions have a
limited receptive field that prevents its applications to more complex 2D
tasks. We propose a new neural model, called Matrix Shuffle-Exchange network,
that can efficiently exploit long-range dependencies in 2D data and has
comparable speed to a convolutional neural network. It is derived from Neural
Shuffle-Exchange network and has $\mathcal{O}( \log{n})$ layers and
$\mathcal{O}( n^2 \log{n})$ total time and space complexity for processing a $n
\times n$ data matrix. We show that the Matrix Shuffle-Exchange network is
well-suited for algorithmic and logical reasoning tasks on matrices and dense
graphs, exceeding convolutional and graph neural network baselines. Its
distinct advantage is the capability of retaining full long-range dependency
modelling when generalizing to larger instances - much larger than could be
processed with models equipped with a dense attention mechanism.
Related papers
- Training Multi-layer Neural Networks on Ising Machine [41.95720316032297]
This paper proposes an Ising learning algorithm to train quantized neural network (QNN)
As far as we know, this is the first algorithm to train multi-layer feedforward networks on Ising machines.
arXiv Detail & Related papers (2023-11-06T04:09:15Z) - NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference [20.404864470321897]
We introduce NeuralMatrix, which elastically transforms the computations of entire deep neural network (DNN) models into linear matrix operations.
Experiments with both CNN and transformer-based models demonstrate the potential of NeuralMatrix to accurately and efficiently execute a wide range of DNN models.
This level of efficiency is usually only attainable with the accelerator designed for a specific neural network.
arXiv Detail & Related papers (2023-05-23T12:03:51Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - SITHCon: A neural network robust to variations in input scaling on the
time dimension [0.0]
In machine learning, convolutional neural networks (CNNs) have been extremely influential in both computer vision and in recognizing patterns extended over time.
This paper introduces a Scale-Invariant Temporal History Convolution network (SITHCon) that uses a logarithmically-distributed temporal memory.
arXiv Detail & Related papers (2021-07-09T18:11:50Z) - VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator.
textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z) - Partitioning sparse deep neural networks for scalable training and
inference [8.282177703075453]
State-of-the-art deep neural networks (DNNs) have significant computational and data management requirements.
Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs.
The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning.
arXiv Detail & Related papers (2021-04-23T20:05:52Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z) - Learning Sparse & Ternary Neural Networks with Entropy-Constrained
Trained Ternarization (EC2T) [17.13246260883765]
Deep neural networks (DNNs) have shown remarkable success in a variety of machine learning applications.
In recent years, there is an increasing interest in deploying DNNs to resource-constrained devices with limited energy, memory, and computational budget.
We propose Entropy-Constrained Trained Ternarization (EC2T), a general framework to create sparse and ternary neural networks.
arXiv Detail & Related papers (2020-04-02T15:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.