QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling
- URL: http://arxiv.org/abs/2512.06582v1
- Date: Sat, 06 Dec 2025 22:29:19 GMT
- Title: QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling
- Authors: Isaac Kofi Nti,
- Abstract summary: This paper introduces the Quantum-Leap LSTM (QL-LSTM), a recurrent architecture designed to address both challenges through two independent components.<n>We evaluate QL-LSTM on sentiment classification using the IMDB dataset with extended document lengths, comparing it to LSTM, GRU, and BiLSTM reference models.<n>Although the PSUG and HGR-ASC components are more efficient per time step, the current prototype remains limited by the inherent sequential nature of recurrent models and therefore does not yet yield wall-clock speed improvements without further kernel-level optimization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ability to retain information across long temporal distances. This paper introduces the Quantum-Leap LSTM (QL-LSTM), a recurrent architecture designed to address both challenges through two independent components. The Parameter-Shared Unified Gating mechanism replaces all gate-specific transformations with a single shared weight matrix, reducing parameters by approximately 48 percent while preserving full gating behavior. The Hierarchical Gated Recurrence with Additive Skip Connections component adds a multiplication-free pathway that improves long-range information flow and reduces forget-gate degradation. We evaluate QL-LSTM on sentiment classification using the IMDB dataset with extended document lengths, comparing it to LSTM, GRU, and BiLSTM reference models. QL-LSTM achieves competitive accuracy while using substantially fewer parameters. Although the PSUG and HGR-ASC components are more efficient per time step, the current prototype remains limited by the inherent sequential nature of recurrent models and therefore does not yet yield wall-clock speed improvements without further kernel-level optimization.
Related papers
- LSTM-MAS: A Long Short-Term Memory Inspired Multi-Agent System for Long-Context Understanding [24.027208865014064]
Long language models (LLMs) are difficult to process because of the accumulation of errors and the propagation of hallucinations.<n>We design a Multi-Agent System called LSTM-MAS, emulating LSTM's hierarchical information flow and gated memory mechanisms for long-context understanding.<n>Our model achieves improvements of 40.93%, 43.70%,121.57% and 33.12%, on NarrativeQA, Qasper, HotpotQA, and MuSiQue, respectively.
arXiv Detail & Related papers (2026-01-17T05:16:23Z) - QKAN-LSTM: Quantum-inspired Kolmogorov-Arnold Long Short-term Memory [11.996286932948124]
Long short-term memory (LSTM) models are central to sequential modeling tasks in domains such as urban telecommunication forecasting.<n>We propose the Quantum-inspired Kolmogorov-Arnold Long Short-Term Memory (QKAN-LSTM)<n>QKAN-LSTM integrates Data Re-Uploading Activation modules into the gating structure of LSTMs.
arXiv Detail & Related papers (2025-12-04T18:03:23Z) - Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints [51.83081671798784]
Diffusion Transformers (DiT) have emerged as a powerful architecture for image and video generation, offering superior quality and scalability.<n>DiT's practical application suffers from inherent dynamic feature instability, leading to error amplification during cached inference.<n>We propose Skip-DiT, an image and video generative DiT variant enhanced with Long-Skip-Connections (LSCs) - the key efficiency component in U-Nets.
arXiv Detail & Related papers (2024-11-26T17:28:10Z) - Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training [78.93900796545523]
Mini-Sequence Transformer (MsT) is a methodology for highly efficient and accurate LLM training with extremely long sequences.
MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage.
integrated with the huggingface library, MsT successfully extends the maximum context length of Qwen, Mistral, and Gemma-2 by 12-24x.
arXiv Detail & Related papers (2024-07-22T01:52:30Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - xLSTM: Extended Long Short-Term Memory [26.607656211983155]
In the 1990s, constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM)<n>We introduce exponential gating with appropriate normalization and stabilization techniques.<n>We modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule.
arXiv Detail & Related papers (2024-05-07T17:50:21Z) - Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis.
Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation.
Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z) - RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence
Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task.
Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z) - Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs [1.7969777786551424]
Spiking Neural Networks (SNNs) have emerged as an attractive-temporal computing paradigm vision for complex tasks.
We propose an optimized spiking long short-term memory networks (LSTM) training framework that involves a novel.
rev-to-SNN conversion framework, followed by SNN training.
We evaluate our framework on sequential learning tasks including temporal M, Google Speech Commands (GSC) datasets, and UCI Smartphone on different LSTM architectures.
arXiv Detail & Related papers (2022-10-23T04:10:27Z) - Self-Gated Memory Recurrent Network for Efficient Scalable HDR
Deghosting [59.04604001936661]
We propose a novel recurrent network-based HDR deghosting method for fusing arbitrary length dynamic sequences.
We introduce a new recurrent cell architecture, namely Self-Gated Memory (SGM) cell, that outperforms the standard LSTM cell.
The proposed approach achieves state-of-the-art performance compared to existing HDR deghosting methods quantitatively across three publicly available datasets.
arXiv Detail & Related papers (2021-12-24T12:36:33Z) - Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks.
Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z) - Sentiment Analysis Using Simplified Long Short-term Memory Recurrent
Neural Networks [1.5146765382501612]
We perform sentiment analysis on a GOP Debate Twitter dataset.
To speed up training and reduce the computational cost and time, six different parameter reduced slim versions of the LSTM model are proposed.
arXiv Detail & Related papers (2020-05-08T12:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.