Improving Deep Learning for HAR with shallow LSTMs
- URL: http://arxiv.org/abs/2108.00702v2
- Date: Thu, 5 Aug 2021 11:00:12 GMT
- Title: Improving Deep Learning for HAR with shallow LSTMs
- Authors: Marius Bock, Alexander Hoelzemann, Michael Moeller, Kristof Van
Laerhoven
- Abstract summary: We propose to alter the DeepConvLSTM to employ a 1-layered instead of a 2-layered LSTM.
Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data.
- Score: 70.94062293989832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies in Human Activity Recognition (HAR) have shown that Deep
Learning methods are able to outperform classical Machine Learning algorithms.
One popular Deep Learning architecture in HAR is the DeepConvLSTM. In this
paper we propose to alter the DeepConvLSTM architecture to employ a 1-layered
instead of a 2-layered LSTM. We validate our architecture change on 5 publicly
available HAR datasets by comparing the predictive performance with and without
the change employing varying hidden units within the LSTM layer(s). Results
show that across all datasets, our architecture consistently improves on the
original one: Recognition performance increases up to 11.7% for the F1-score,
and our architecture significantly decreases the amount of learnable
parameters. This improvement over DeepConvLSTM decreases training time by as
much as 48%. Our results stand in contrast to the belief that one needs at
least a 2-layered LSTM when dealing with sequential data. Based on our results
we argue that said claim might not be applicable to sensor-based HAR.
Related papers
- Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research.
Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration.
Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z) - Are LSTMs Good Few-Shot Learners? [4.316506818580031]
In 2001, Hochreiter et al. showed that an LSTM trained with backpropagation across different tasks is capable of meta-learning.
We revisit this approach and test it on modern few-shot learning benchmarks.
We find that LSTM, surprisingly, outperform the popular meta-learning technique MAML on a simple few-shot sine wave regression benchmark, but that LSTM, expectedly, fall short on more complex few-shot image classification benchmarks.
arXiv Detail & Related papers (2023-10-22T00:16:30Z) - Efficient shallow learning as an alternative to deep learning [0.0]
We show that the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer.
A power law with a similar exponent also characterizes the generalized VGG-16 architecture.
Conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to minimize error rates.
arXiv Detail & Related papers (2022-11-15T10:10:27Z) - Image Classification using Sequence of Pixels [3.04585143845864]
This study compares sequential image classification methods based on recurrent neural networks.
We describe methods based on Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc.
arXiv Detail & Related papers (2022-09-23T09:42:44Z) - LiteLSTM Architecture for Deep Recurrent Neural Networks [1.1602089225841632]
Longtemporal short-term memory (LSTM) is a robust recurrent neural network architecture for learning data.
This paper proposes a novel LiteLSTM architecture based on reducing the components of the LSTM using the weights sharing concept.
The proposed LiteLSTM can be significant for learning big data where time-consumption is crucial.
arXiv Detail & Related papers (2022-01-27T16:33:02Z) - Multi-Perspective LSTM for Joint Visual Representation Learning [81.21490913108835]
We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives.
Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level.
We show that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks.
arXiv Detail & Related papers (2021-05-06T16:44:40Z) - Stateless Neural Meta-Learning using Second-Order Gradients [1.933681537640272]
We show that the meta-learner LSTM subsumes MAML.
We construct a new algorithm (dubbed TURTLE) which is simpler than the meta-learner LSTM yet more expressive than MAML.
arXiv Detail & Related papers (2021-04-21T13:34:31Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z) - Deep transfer learning for improving single-EEG arousal detection [63.52264764099532]
Two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models.
We train a baseline model and replace the first two layers to prepare the architecture for single-channel electroencephalography data.
Using a fine-tuning strategy, our model yields similar performance to the baseline model and was significantly better than a comparable single-channel model.
arXiv Detail & Related papers (2020-04-10T16:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.