HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
- URL: http://arxiv.org/abs/2502.00047v2
- Date: Wed, 05 Feb 2025 08:22:28 GMT
- Title: HadamRNN: Binary and Sparse Ternary Orthogonal RNNs
- Authors: Armand Foucault, Franck Mamalet, François Malgouyres,
- Abstract summary: Binary and sparse ternary weights in neural networks enable faster computations and lighter representations.
vanilla RNNs are highly sensitive to changes in their recurrent weights, making the binarization and ternarization of these weights inherently challenging.
We present a new approach leveraging the properties of Hadamard matrices to parameterize a subset of binary and sparse ternary matrices.
- Score: 6.524758376347808
- License:
- Abstract: Binary and sparse ternary weights in neural networks enable faster computations and lighter representations, facilitating their use on edge devices with limited computational power. Meanwhile, vanilla RNNs are highly sensitive to changes in their recurrent weights, making the binarization and ternarization of these weights inherently challenging. To date, no method has successfully achieved binarization or ternarization of vanilla RNN weights. We present a new approach leveraging the properties of Hadamard matrices to parameterize a subset of binary and sparse ternary orthogonal matrices. This method enables the training of orthogonal RNNs (ORNNs) with binary and sparse ternary recurrent weights, effectively creating a specific class of binary and sparse ternary vanilla RNNs. The resulting ORNNs, called HadamRNN and lock-HadamRNN, are evaluated on benchmarks such as the copy task, permuted and sequential MNIST tasks, and IMDB dataset. Despite binarization or sparse ternarization, these RNNs maintain performance levels comparable to state-of-the-art full-precision models, highlighting the effectiveness of our approach. Notably, our approach is the first solution with binary recurrent weights capable of tackling the copy task over 1000 timesteps.
Related papers
- Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues [65.41946981594567]
Linear Recurrent Neural Networks (LRNNs) have emerged as efficient alternatives to Transformers in large language modeling.
LRNNs struggle to perform state-tracking which may impair performance in tasks such as code evaluation or tracking a chess game.
Our work enhances the expressivity of modern LRNNs, broadening their applicability without changing the cost of training or inference.
arXiv Detail & Related papers (2024-11-19T14:35:38Z) - Quantized Approximately Orthogonal Recurrent Neural Networks [6.524758376347808]
We explore the quantization of the weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs)
We propose and investigate two strategies to learn QORNN by combining quantization-aware training (QAT) and computation projections.
The most efficient models achieve results similar to state-of-the-art full-precision ORNN, LSTM and FastRNN on a variety of standard benchmarks, even with 4-bits quantization.
arXiv Detail & Related papers (2024-02-05T09:59:57Z) - Projected Stochastic Gradient Descent with Quantum Annealed Binary Gradients [51.82488018573326]
We present QP-SBGD, a novel layer-wise optimiser tailored towards training neural networks with binary weights.
BNNs reduce the computational requirements and energy consumption of deep learning models with minimal loss in accuracy.
Our algorithm is implemented layer-wise, making it suitable to train larger networks on resource-limited quantum hardware.
arXiv Detail & Related papers (2023-10-23T17:32:38Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Adaptive-saturated RNN: Remember more with less instability [2.191505742658975]
This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two approaches.
Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors.
arXiv Detail & Related papers (2023-04-24T02:28:03Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Examining the Robustness of Spiking Neural Networks on Non-ideal
Memristive Crossbars [4.184276171116354]
Spiking Neural Networks (SNNs) have emerged as the low-power alternative to Artificial Neural Networks (ANNs)
We study the effect of crossbar non-idealities and intrinsicity on the performance of SNNs.
arXiv Detail & Related papers (2022-06-20T07:07:41Z) - Low-bit Quantization of Recurrent Neural Network Language Models Using
Alternating Direction Methods of Multipliers [67.688697838109]
This paper presents a novel method to train quantized RNNLMs from scratch using alternating direction methods of multipliers (ADMM)
Experiments on two tasks suggest the proposed ADMM quantization achieved a model size compression factor of up to 31 times over the full precision baseline RNNLMs.
arXiv Detail & Related papers (2021-11-29T09:30:06Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Block-term Tensor Neural Networks [29.442026567710435]
We show that block-term tensor layers (BT-layers) can be easily adapted to neural network models, such as CNNs and RNNs.
BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.
arXiv Detail & Related papers (2020-10-10T09:58:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.