Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for
Video Recognition with Hierarchical Tucker Tensor Decomposition
- URL: http://arxiv.org/abs/2212.02046v1
- Date: Mon, 5 Dec 2022 05:51:56 GMT
- Title: Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for
Video Recognition with Hierarchical Tucker Tensor Decomposition
- Authors: Yu Gong, Miao Yin, Lingyi Huang, Chunhua Deng, Yang Sui, Bo Yuan
- Abstract summary: Long short-term memory (LSTM) is a powerful deep neural network that has been widely used in sequence analysis and modeling applications.
In this paper, we propose to perform algorithm and hardware co-design towards high-performance energy-efficient LSTM networks.
- Score: 22.502146009817416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long short-term memory (LSTM) is a type of powerful deep neural network that
has been widely used in many sequence analysis and modeling applications.
However, the large model size problem of LSTM networks make their practical
deployment still very challenging, especially for the video recognition tasks
that require high-dimensional input data. Aiming to overcome this limitation
and fully unlock the potentials of LSTM models, in this paper we propose to
perform algorithm and hardware co-design towards high-performance
energy-efficient LSTM networks. At algorithm level, we propose to develop fully
decomposed hierarchical Tucker (FDHT) structure-based LSTM, namely FDHT-LSTM,
which enjoys ultra-low model complexity while still achieving high accuracy. In
order to fully reap such attractive algorithmic benefit, we further develop the
corresponding customized hardware architecture to support the efficient
execution of the proposed FDHT-LSTM model. With the delicate design of memory
access scheme, the complicated matrix transformation can be efficiently
supported by the underlying hardware without any access conflict in an
on-the-fly way. Our evaluation results show that both the proposed
ultra-compact FDHT-LSTM models and the corresponding hardware accelerator
achieve very high performance. Compared with the state-of-the-art compressed
LSTM models, FDHT-LSTM enjoys both order-of-magnitude reduction in model size
and significant accuracy improvement across different video recognition
datasets. Meanwhile, compared with the state-of-the-art tensor decomposed
model-oriented hardware TIE, our proposed FDHT-LSTM architecture achieves
better performance in throughput, area efficiency and energy efficiency,
respectively on LSTM-Youtube workload. For LSTM-UCF workload, our proposed
design also outperforms TIE with higher throughput, higher energy efficiency
and comparable area efficiency.
Related papers
- Unlocking the Power of LSTM for Long Term Time Series Forecasting [27.245021350821638]
We propose a simple yet efficient algorithm named P-sLSTM built upon sLSTM by incorporating patching and channel independence.
These modifications substantially enhance sLSTM's performance in TSF, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T13:59:26Z) - A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation.
This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z) - DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks.
We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT)
DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z) - MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video
Prediction Domain [8.216911980865902]
Existing RNN models obtain the multi-scale of features only by stacking layers.
This paper proposes MS-LSTM wholly from a multi-scale perspective.
We theoretically analyze the training cost and performance of MS-LSTM and its components.
arXiv Detail & Related papers (2023-04-16T08:25:02Z) - LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural
Networks [1.1602089225841632]
Long short-term memory (LSTM) is one of the robust recurrent neural network architectures for learning sequential data.
This paper proposed a novel LiteLSTM architecture based on reducing the LSTM computation components via the weights sharing concept.
The proposed LiteLSTM has comparable accuracy to the other state-of-the-art recurrent architecture while using a smaller computation budget.
arXiv Detail & Related papers (2023-01-12T03:39:59Z) - Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs [1.7969777786551424]
Spiking Neural Networks (SNNs) have emerged as an attractive-temporal computing paradigm vision for complex tasks.
We propose an optimized spiking long short-term memory networks (LSTM) training framework that involves a novel.
rev-to-SNN conversion framework, followed by SNN training.
We evaluate our framework on sequential learning tasks including temporal M, Google Speech Commands (GSC) datasets, and UCI Smartphone on different LSTM architectures.
arXiv Detail & Related papers (2022-10-23T04:10:27Z) - Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge
Devices [90.30316433184414]
We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream.
Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
arXiv Detail & Related papers (2022-10-16T16:21:40Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Compressing LSTM Networks by Matrix Product Operators [7.395226141345625]
Long Short Term Memory(LSTM) models are the building blocks of many state-of-the-art natural language processing(NLP) and speech enhancement(SE) algorithms.
Here we introduce the MPO decomposition, which describes the local correlation of quantum states in quantum many-body physics.
We propose a matrix product operator(MPO) based neural network architecture to replace the LSTM model.
arXiv Detail & Related papers (2020-12-22T11:50:06Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.