Related papers: Improving Deep Learning for HAR with shallow LSTMs

Improving Deep Learning for HAR with shallow LSTMs

URL: http://arxiv.org/abs/2108.00702v2
Date: Thu, 5 Aug 2021 11:00:12 GMT
Title: Improving Deep Learning for HAR with shallow LSTMs
Authors: Marius Bock, Alexander Hoelzemann, Michael Moeller, Kristof Van Laerhoven
Abstract summary: We propose to alter the DeepConvLSTM to employ a 1-layered instead of a 2-layered LSTM. Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data.
Score: 70.94062293989832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies in Human Activity Recognition (HAR) have shown that Deep Learning methods are able to outperform classical Machine Learning algorithms. One popular Deep Learning architecture in HAR is the DeepConvLSTM. In this paper we propose to alter the DeepConvLSTM architecture to employ a 1-layered instead of a 2-layered LSTM. We validate our architecture change on 5 publicly available HAR datasets by comparing the predictive performance with and without the change employing varying hidden units within the LSTM layer(s). Results show that across all datasets, our architecture consistently improves on the original one: Recognition performance increases up to 11.7% for the F1-score, and our architecture significantly decreases the amount of learnable parameters. This improvement over DeepConvLSTM decreases training time by as much as 48%. Our results stand in contrast to the belief that one needs at least a 2-layered LSTM when dealing with sequential data. Based on our results we argue that said claim might not be applicable to sensor-based HAR.

Related papers

Improving Deep Knowledge Tracing via Gated Architectures and Adaptive Optimization [0.0]
Deep Knowledge Tracing (DKT) models student learning behavior by using Recurrent Networks (RNNs) to predict future performance based on historical interaction data. In this work, we revisit the DKT model from two perspectives: architectural improvements and optimization. First, we enhance the model using gated recurrent units, specifically Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRU) Second, we re-implement DKT using the PyTorch framework, enabling a modular and accessible infrastructure compatible with modern deep learning.
arXiv Detail & Related papers (2025-04-24T14:24:31Z)
SEKI: Self-Evolution and Knowledge Inspiration based Neural Architecture Search via Large Language Models [11.670056503731905]
We introduce SEKI, a novel large language model (LLM)-based neural architecture search (NAS) method. Inspired by the chain-of-thought (CoT) paradigm in modern LLMs, SEKI operates in two key stages: self-evolution and knowledge distillation.
arXiv Detail & Related papers (2025-02-27T09:17:49Z)
Search for Efficient Large Language Models [52.98684997131108]
Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research. Weight pruning, quantization, and distillation have been embraced to compress LLMs, targeting memory reduction and inference acceleration. Most model compression techniques concentrate on weight optimization, overlooking the exploration of optimal architectures.
arXiv Detail & Related papers (2024-09-25T21:32:12Z)
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances. Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data. We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Are LSTMs Good Few-Shot Learners? [4.316506818580031]
In 2001, Hochreiter et al. showed that an LSTM trained with backpropagation across different tasks is capable of meta-learning. We revisit this approach and test it on modern few-shot learning benchmarks. We find that LSTM, surprisingly, outperform the popular meta-learning technique MAML on a simple few-shot sine wave regression benchmark, but that LSTM, expectedly, fall short on more complex few-shot image classification benchmarks.
arXiv Detail & Related papers (2023-10-22T00:16:30Z)
Efficient shallow learning as an alternative to deep learning [0.0]
We show that the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer. A power law with a similar exponent also characterizes the generalized VGG-16 architecture. Conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to minimize error rates.
arXiv Detail & Related papers (2022-11-15T10:10:27Z)
Image Classification using Sequence of Pixels [3.04585143845864]
This study compares sequential image classification methods based on recurrent neural networks. We describe methods based on Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc.
arXiv Detail & Related papers (2022-09-23T09:42:44Z)
LiteLSTM Architecture for Deep Recurrent Neural Networks [1.1602089225841632]
Longtemporal short-term memory (LSTM) is a robust recurrent neural network architecture for learning data. This paper proposes a novel LiteLSTM architecture based on reducing the components of the LSTM using the weights sharing concept. The proposed LiteLSTM can be significant for learning big data where time-consumption is crucial.
arXiv Detail & Related papers (2022-01-27T16:33:02Z)
Multi-Perspective LSTM for Joint Visual Representation Learning [81.21490913108835]
We present a novel LSTM cell architecture capable of learning both intra- and inter-perspective relationships available in visual sequences captured from multiple perspectives. Our architecture adopts a novel recurrent joint learning strategy that uses additional gates and memories at the cell level. We show that by using the proposed cell to create a network, more effective and richer visual representations are learned for recognition tasks.
arXiv Detail & Related papers (2021-05-06T16:44:40Z)
Stateless Neural Meta-Learning using Second-Order Gradients [1.933681537640272]
We show that the meta-learner LSTM subsumes MAML. We construct a new algorithm (dubbed TURTLE) which is simpler than the meta-learner LSTM yet more expressive than MAML.
arXiv Detail & Related papers (2021-04-21T13:34:31Z)
Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks. We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
Deep transfer learning for improving single-EEG arousal detection [63.52264764099532]
Two datasets do not contain exactly the same setup leading to degraded performance in single-EEG models. We train a baseline model and replace the first two layers to prepare the architecture for single-channel electroencephalography data. Using a fine-tuning strategy, our model yields similar performance to the baseline model and was significantly better than a comparable single-channel model.
arXiv Detail & Related papers (2020-04-10T16:51:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.