HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full
Context Interaction
- URL: http://arxiv.org/abs/2401.17948v1
- Date: Wed, 31 Jan 2024 15:57:21 GMT
- Title: HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full
Context Interaction
- Authors: Harvie Zhang
- Abstract summary: Self-attention mechanism utilizes large implicit weight matrices, programmed through dot product-based activations with very few trainable parameters, to enable long sequence modeling.
In this paper, we investigate the possibility of discarding residual learning by employing large implicit kernels to achieve full context interaction at each layer of the network.
Our model incorporates several innovative components and exhibits excellent properties, such as introducing local feedback error for updating the slow network, stable zero-mean features, faster training convergence, and fewer model parameters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The self-attention mechanism utilizes large implicit weight matrices,
programmed through dot product-based activations with very few trainable
parameters, to enable long sequence modeling. In this paper, we investigate the
possibility of discarding residual learning by employing large implicit kernels
to achieve full context interaction at each layer of the network. To accomplish
it, we introduce coordinate-based implicit MLPs as a slow network to generate
hyper-kernels for another fast convolutional network. To get context-varying
weights for fast dynamic encoding, we propose a
$\mathrm{Hyper}\mathcal{Z{\cdot}Z{\cdot}W}$ operator that connects
hyper-kernels ($\mathcal{W}$) and hidden activations ($\mathcal{Z}$) through
simple elementwise multiplication, followed by convolution of $\mathcal{Z}$
using the context-dependent $\mathcal{W}$. Based on this design, we present a
novel Terminator architecture that integrates hyper-kernels of different sizes
to produce multi-branch hidden representations for enhancing the feature
extraction capability of each layer. Additionally, a bottleneck layer is
employed to compress the concatenated channels, allowing only valuable
information to propagate to the subsequent layers. Notably, our model
incorporates several innovative components and exhibits excellent properties,
such as introducing local feedback error for updating the slow network, stable
zero-mean features, faster training convergence, and fewer model parameters.
Extensive experimental results on pixel-level 1D and 2D image classification
benchmarks demonstrate the superior performance of our architecture.
Related papers
- Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks.
We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z) - "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - SymbolNet: Neural Symbolic Regression with Adaptive Dynamic Pruning [1.0356366043809717]
We propose a neural network approach to symbolic regression in a novel framework that allows dynamic pruning of model weights, input features, and mathematical operators in a single training process.
Our approach enables symbolic regression to achieve fast inference with nanosecond-scale latency on FPGAs for high-dimensional datasets in environments with stringent computational resource constraints.
arXiv Detail & Related papers (2024-01-18T12:51:38Z) - Kronecker-Factored Approximate Curvature for Modern Neural Network
Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC)
We show they are exact for deep linear networks with weight-sharing in their respective setting.
We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Dynamic ConvNets on Tiny Devices via Nested Sparsity [3.0313758880048765]
This work introduces a new training and compression pipeline to build Nested Sparse ConvNets.
A Nested Sparse ConvNet consists of a single ConvNet architecture containing N sparse sub-networks with nested weights subsets.
Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 Micro Controller Unit.
arXiv Detail & Related papers (2022-03-07T12:07:02Z) - Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality.
A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent.
We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Sparse Coding Driven Deep Decision Tree Ensembles for Nuclear
Segmentation in Digital Pathology Images [15.236873250912062]
We propose an easily trained yet powerful representation learning approach with performance highly competitive to deep neural networks in a digital pathology image segmentation task.
The method, called sparse coding driven deep decision tree ensembles that we abbreviate as ScD2TE, provides a new perspective on representation learning.
arXiv Detail & Related papers (2020-08-13T02:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.