Related papers: LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks

LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks

URL: http://arxiv.org/abs/2301.04794v1
Date: Thu, 12 Jan 2023 03:39:59 GMT
Title: LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks
Authors: Nelly Elsayed, Zag ElSayed, Anthony S. Maida
Abstract summary: Long short-term memory (LSTM) is one of the robust recurrent neural network architectures for learning sequential data. This paper proposed a novel LiteLSTM architecture based on reducing the LSTM computation components via the weights sharing concept. The proposed LiteLSTM has comparable accuracy to the other state-of-the-art recurrent architecture while using a smaller computation budget.
Score: 1.1602089225841632
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long short-term memory (LSTM) is one of the robust recurrent neural network architectures for learning sequential data. However, it requires considerable computational power to learn and implement both software and hardware aspects. This paper proposed a novel LiteLSTM architecture based on reducing the LSTM computation components via the weights sharing concept to reduce the overall architecture computation cost and maintain the architecture performance. The proposed LiteLSTM can be significant for processing large data where time-consuming is crucial while hardware resources are limited, such as the security of IoT devices and medical data processing. The proposed model was evaluated and tested empirically on three different datasets from the computer vision, cybersecurity, speech emotion recognition domains. The proposed LiteLSTM has comparable accuracy to the other state-of-the-art recurrent architecture while using a smaller computation budget.

Related papers

Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research. Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration. Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z)
A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling. A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs. In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z)
Implementation Guidelines and Innovations in Quantum LSTM Networks [2.938337278931738]
This paper presents a theoretical analysis and an implementation plan for a Quantum LSTM model, which seeks to integrate quantum computing principles with traditional LSTM networks. The actual architecture and its practical effectiveness in enhancing sequential data processing remain to be developed and demonstrated in future work.
arXiv Detail & Related papers (2024-06-13T10:26:14Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition [22.502146009817416]
Long short-term memory (LSTM) is a powerful deep neural network that has been widely used in sequence analysis and modeling applications. In this paper, we propose to perform algorithm and hardware co-design towards high-performance energy-efficient LSTM networks.
arXiv Detail & Related papers (2022-12-05T05:51:56Z)
Neural Architecture Search for Improving Latency-Accuracy Trade-off in Split Computing [5.516431145236317]
Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively processed using edge servers and IoT devices via networks. This paper proposes a neural architecture search (NAS) method for split computing.
arXiv Detail & Related papers (2022-08-30T03:15:43Z)
LiteLSTM Architecture for Deep Recurrent Neural Networks [1.1602089225841632]
Longtemporal short-term memory (LSTM) is a robust recurrent neural network architecture for learning data. This paper proposes a novel LiteLSTM architecture based on reducing the components of the LSTM using the weights sharing concept. The proposed LiteLSTM can be significant for learning big data where time-consumption is crucial.
arXiv Detail & Related papers (2022-01-27T16:33:02Z)
Improving Deep Learning for HAR with shallow LSTMs [70.94062293989832]
We propose to alter the DeepConvLSTM to employ a 1-layered instead of a 2-layered LSTM. Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data.
arXiv Detail & Related papers (2021-08-02T08:14:59Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
Near-Optimal Hardware Design for Convolutional Neural Networks [0.0]
This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks. The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model. An implementation based on the proposed hardware architecture has been applied in commercial AI products.
arXiv Detail & Related papers (2020-02-06T09:15:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.