Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
- URL: http://arxiv.org/abs/2410.13605v1
- Date: Thu, 17 Oct 2024 14:39:55 GMT
- Title: Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges
- Authors: Clayton Souza Leite, Henry Mauranen, Aziza Zhanabatyrova, Yu Xiao,
- Abstract summary: Transformers have excelled in natural language processing and computer vision, paving their way to sensor-based Human Activity Recognition (HAR)
Previous studies show that transformers outperform their counterparts exclusively when they harness abundant data or employ compute-intensive optimization algorithms.
However, neither of these scenarios is viable in sensor-based HAR due to the scarcity of data in this field and the frequent need to perform training and inference on resource-constrained devices.
- Score: 0.5983301154764783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have excelled in natural language processing and computer vision, paving their way to sensor-based Human Activity Recognition (HAR). Previous studies show that transformers outperform their counterparts exclusively when they harness abundant data or employ compute-intensive optimization algorithms. However, neither of these scenarios is viable in sensor-based HAR due to the scarcity of data in this field and the frequent need to perform training and inference on resource-constrained devices. Our extensive investigation into various implementations of transformer-based versus non-transformer-based HAR using wearable sensors, encompassing more than 500 experiments, corroborates these concerns. We observe that transformer-based solutions pose higher computational demands, consistently yield inferior performance, and experience significant performance degradation when quantized to accommodate resource-constrained devices. Additionally, transformers demonstrate lower robustness to adversarial attacks, posing a potential threat to user trust in HAR.
Related papers
- Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection [14.363929799618283]
We propose CHain Of thOught Symbol dEtection (CHOOSE), a CoT-enhanced shallow Transformer framework for wireless symbol detection.<n>By introducing autoregressive latent reasoning steps within the hidden space, CHOOSE significantly improves the reasoning capacity of shallow models.<n> Experimental results demonstrate that our approach outperforms conventional shallow Transformers and achieves performance comparable to that of deep Transformers.
arXiv Detail & Related papers (2025-06-26T08:41:45Z) - On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent [51.50999191584981]
Sign Gradient Descent (SignGD) serves as an effective surrogate for Adam.
We study how SignGD optimize a two-layer transformer on a noisy dataset.
We find that the poor generalization of SignGD is not solely due to data noise, suggesting that both SignGD and Adam requires high-quality data for real-world tasks.
arXiv Detail & Related papers (2024-10-07T09:36:43Z) - Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization [88.5582111768376]
We study the optimization of a Transformer composed of a self-attention layer with softmax followed by a fully connected layer under gradient descent on a certain data distribution model.
Our results establish a sharp condition that can distinguish between the small test error phase and the large test error regime, based on the signal-to-noise ratio in the data model.
arXiv Detail & Related papers (2024-09-28T13:24:11Z) - Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study [52.91899050612153]
transformers within pre-trained language models (PLMs) when repurposed as encoders for Automatic Speech Recognition (ASR)
Our findings reveal a notable improvement in Character Error Rate (CER) and Word Error Rate (WER) across diverse ASR tasks when transformers from pre-trained LMs are incorporated.
This underscores the potential of leveraging the semantic prowess embedded within pre-trained transformers to advance ASR systems' capabilities.
arXiv Detail & Related papers (2024-09-26T11:31:18Z) - Transformers Learn Low Sensitivity Functions: Investigations and Implications [18.77893015276986]
Transformers achieve state-of-the-art accuracy and robustness across many tasks.
We identify the sensitivity of the model to token-wise random perturbations in the input as a unified metric.
We show that transformers have lower sensitivity than CNNs, CNNs, ConvMixers and LSTMs, across both vision and language tasks.
arXiv Detail & Related papers (2024-03-11T17:12:09Z) - FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised
Pretraining [36.44039681893334]
Hyperspectral images (HSIs) contain rich spectral and spatial information.
Current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension.
We propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pretraining procedures.
arXiv Detail & Related papers (2023-09-18T02:05:52Z) - Transformers in Reinforcement Learning: A Survey [7.622978576824539]
Transformers have impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks.
This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability.
arXiv Detail & Related papers (2023-07-12T07:51:12Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Lightweight Transformers for Human Activity Recognition on Mobile
Devices [0.5505634045241288]
Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models.
We present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture.
Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results.
arXiv Detail & Related papers (2022-09-22T09:42:08Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Vision Transformer Equipped with Neural Resizer on Facial Expression
Recognition Task [1.3048920509133808]
We propose a novel training framework, Neural Resizer, to support Transformer by compensating information and downscaling in a data-driven manner.
Experiments show our Neural Resizer with F-PDLS loss function improves the performance with Transformer variants in general.
arXiv Detail & Related papers (2022-04-05T13:04:04Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.