Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST)
Activation Under Data Constraints
- URL: http://arxiv.org/abs/2402.09034v1
- Date: Wed, 14 Feb 2024 09:20:13 GMT
- Title: Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST)
Activation Under Data Constraints
- Authors: Barathi Subramanian, Rathinaraja Jeyaraj, Rakhmonov Akhrorjon
Akhmadjon Ugli, and Jeonghong Kim
- Abstract summary: We propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints.
SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time.
We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Activation functions enable neural networks to learn complex representations
by introducing non-linearities. While feedforward models commonly use rectified
linear units, sequential models like recurrent neural networks, long short-term
memory (LSTMs) and gated recurrent units (GRUs) still rely on Sigmoid and TanH
activation functions. However, these classical activation functions often
struggle to model sparse patterns when trained on small sequential datasets to
effectively capture temporal dependencies. To address this limitation, we
propose squared Sigmoid TanH (SST) activation specifically tailored to enhance
the learning capability of sequential models under data constraints. SST
applies mathematical squaring to amplify differences between strong and weak
activations as signals propagate over time, facilitating improved gradient flow
and information filtering. We evaluate SST-powered LSTMs and GRUs for diverse
applications, such as sign language recognition, regression, and time-series
classification tasks, where the dataset is limited. Our experiments demonstrate
that SST models consistently outperform RNN-based models with baseline
activations, exhibiting improved test accuracy.
Related papers
- Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate [17.611912733951662]
Recurrent Neural Networks (RNNs) are renowned for their adeptness in modeling temporal dependencies.
We propose a novel Delayed Memory Unit (DMU) in this paper to enhance the temporal modeling capabilities of vanilla RNNs.
Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks.
arXiv Detail & Related papers (2023-10-23T14:29:48Z) - Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos.
The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters.
Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z) - Correlation-aware Spatial-Temporal Graph Learning for Multivariate
Time-series Anomaly Detection [67.60791405198063]
We propose a correlation-aware spatial-temporal graph learning (termed CST-GL) for time series anomaly detection.
CST-GL explicitly captures the pairwise correlations via a multivariate time series correlation learning module.
A novel anomaly scoring component is further integrated into CST-GL to estimate the degree of an anomaly in a purely unsupervised manner.
arXiv Detail & Related papers (2023-07-17T11:04:27Z) - Switching Autoregressive Low-rank Tensor Models [12.461139675114818]
We show how to switch autoregressive low-rank tensor (SALT) models.
SALT parameterizes the tensor of an ARHMM with a low-rank factorization to control the number of parameters.
We prove theoretical and discuss practical connections between SALT, linear dynamical systems, and SLDSs.
arXiv Detail & Related papers (2023-06-05T22:25:28Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - Simple Yet Surprisingly Effective Training Strategies for LSTMs in
Sensor-Based Human Activity Recognition [14.95985947077388]
This paper studies some LSTM training strategies for sporadic activity recognition.
We propose two simple yet effective LSTM variants, namely delay model and inverse model, for two SAR scenarios.
The promising results demonstrated the effectiveness of our approaches in HAR applications.
arXiv Detail & Related papers (2022-12-23T09:17:01Z) - Towards Energy-Efficient, Low-Latency and Accurate Spiking LSTMs [1.7969777786551424]
Spiking Neural Networks (SNNs) have emerged as an attractive-temporal computing paradigm vision for complex tasks.
We propose an optimized spiking long short-term memory networks (LSTM) training framework that involves a novel.
rev-to-SNN conversion framework, followed by SNN training.
We evaluate our framework on sequential learning tasks including temporal M, Google Speech Commands (GSC) datasets, and UCI Smartphone on different LSTM architectures.
arXiv Detail & Related papers (2022-10-23T04:10:27Z) - Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches.
For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models.
The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z) - CARRNN: A Continuous Autoregressive Recurrent Neural Network for Deep
Representation Learning from Sporadic Temporal Data [1.8352113484137622]
In this paper, a novel deep learning-based model is developed for modeling multiple temporal features in sporadic data.
The proposed model, called CARRNN, uses a generalized discrete-time autoregressive model that is trainable end-to-end using neural networks modulated by time lags.
It is applied to multivariate time-series regression tasks using data provided for Alzheimer's disease progression modeling and intensive care unit (ICU) mortality rate prediction.
arXiv Detail & Related papers (2021-04-08T12:43:44Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.