Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST)
Activation Under Data Constraints
- URL: http://arxiv.org/abs/2402.09034v1
- Date: Wed, 14 Feb 2024 09:20:13 GMT
- Title: Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST)
Activation Under Data Constraints
- Authors: Barathi Subramanian, Rathinaraja Jeyaraj, Rakhmonov Akhrorjon
Akhmadjon Ugli, and Jeonghong Kim
- Abstract summary: We propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints.
SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time.
We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Activation functions enable neural networks to learn complex representations
by introducing non-linearities. While feedforward models commonly use rectified
linear units, sequential models like recurrent neural networks, long short-term
memory (LSTMs) and gated recurrent units (GRUs) still rely on Sigmoid and TanH
activation functions. However, these classical activation functions often
struggle to model sparse patterns when trained on small sequential datasets to
effectively capture temporal dependencies. To address this limitation, we
propose squared Sigmoid TanH (SST) activation specifically tailored to enhance
the learning capability of sequential models under data constraints. SST
applies mathematical squaring to amplify differences between strong and weak
activations as signals propagate over time, facilitating improved gradient flow
and information filtering. We evaluate SST-powered LSTMs and GRUs for diverse
applications, such as sign language recognition, regression, and time-series
classification tasks, where the dataset is limited. Our experiments demonstrate
that SST models consistently outperform RNN-based models with baseline
activations, exhibiting improved test accuracy.
Related papers
- DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs [59.434893231950205]
Dynamic graph learning aims to uncover evolutionary laws in real-world systems.
We propose DyG-Mamba, a new continuous state space model for dynamic graph learning.
We show that DyG-Mamba achieves state-of-the-art performance on most datasets.
arXiv Detail & Related papers (2024-08-13T15:21:46Z) - Detecting Anomalies in Dynamic Graphs via Memory enhanced Normality [39.476378833827184]
Anomaly detection in dynamic graphs presents a significant challenge due to the temporal evolution of graph structures and attributes.
We introduce a novel spatial- temporal memories-enhanced graph autoencoder (STRIPE)
STRIPE significantly outperforms existing methods with 5.8% improvement in AUC scores and 4.62X faster in training time.
arXiv Detail & Related papers (2024-03-14T02:26:10Z) - ELiSe: Efficient Learning of Sequences in Structured Recurrent Networks [1.5931140598271163]
We build a model for efficient learning sequences using only local always-on and phase-free plasticity.
We showcase the capabilities of ELiSe in a mock-up of birdsong learning, and demonstrate its flexibility with respect to parametrization.
arXiv Detail & Related papers (2024-02-26T17:30:34Z) - Correlation-aware Spatial-Temporal Graph Learning for Multivariate
Time-series Anomaly Detection [67.60791405198063]
We propose a correlation-aware spatial-temporal graph learning (termed CST-GL) for time series anomaly detection.
CST-GL explicitly captures the pairwise correlations via a multivariate time series correlation learning module.
A novel anomaly scoring component is further integrated into CST-GL to estimate the degree of an anomaly in a purely unsupervised manner.
arXiv Detail & Related papers (2023-07-17T11:04:27Z) - Switching Autoregressive Low-rank Tensor Models [12.461139675114818]
We show how to switch autoregressive low-rank tensor (SALT) models.
SALT parameterizes the tensor of an ARHMM with a low-rank factorization to control the number of parameters.
We prove theoretical and discuss practical connections between SALT, linear dynamical systems, and SLDSs.
arXiv Detail & Related papers (2023-06-05T22:25:28Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - Simple Yet Surprisingly Effective Training Strategies for LSTMs in
Sensor-Based Human Activity Recognition [14.95985947077388]
This paper studies some LSTM training strategies for sporadic activity recognition.
We propose two simple yet effective LSTM variants, namely delay model and inverse model, for two SAR scenarios.
The promising results demonstrated the effectiveness of our approaches in HAR applications.
arXiv Detail & Related papers (2022-12-23T09:17:01Z) - Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches.
For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models.
The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z) - CARRNN: A Continuous Autoregressive Recurrent Neural Network for Deep
Representation Learning from Sporadic Temporal Data [1.8352113484137622]
In this paper, a novel deep learning-based model is developed for modeling multiple temporal features in sporadic data.
The proposed model, called CARRNN, uses a generalized discrete-time autoregressive model that is trainable end-to-end using neural networks modulated by time lags.
It is applied to multivariate time-series regression tasks using data provided for Alzheimer's disease progression modeling and intensive care unit (ICU) mortality rate prediction.
arXiv Detail & Related papers (2021-04-08T12:43:44Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.