Related papers: Lightweight Transformers for Human Activity Recognition on Mobile Devices

Lightweight Transformers for Human Activity Recognition on Mobile Devices

URL: http://arxiv.org/abs/2209.11750v1
Date: Thu, 22 Sep 2022 09:42:08 GMT
Title: Lightweight Transformers for Human Activity Recognition on Mobile Devices
Authors: Sannara EK, Fran\c{c}ois Portet, Philippe Lalanda
Abstract summary: Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models. We present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results.
Score: 0.5505634045241288
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models learned from data generated by the user's inertial measurement units (IMUs). Most approaches for instanced-based HAR have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), or a combination of the two to achieve state-of-the-art results with real-time performances. Recently, the Transformers architecture in the language processing domain and then in the vision domain has pushed further the state-of-the-art over classical architectures. However, such Transformers architecture is heavyweight in computing resources, which is not well suited for embedded applications of HAR that can be found in the pervasive computing domain. In this study, we present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture that has been specifically adapted to the domain of the IMUs embedded on mobile devices. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results. Furthermore, we present evaluations across various architectures on their performances in heterogeneous environments and show that our models can better generalize on different sensing devices or on-body positions.

Related papers

Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [59.193626019860226]
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. We introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers. We show that CAS-ViT achieves a competitive performance when compared to other state-of-the-art backbones.
arXiv Detail & Related papers (2024-08-07T11:33:46Z)
SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation [49.65221743520028]
We show that shifting the multiscale inductive bias into the attention mechanism can work well, resulting in a plain detector SimPLR' We find through our experiments that SimPLR with scale-aware attention is plain and simple architecture, yet competitive with multi-scale vision transformer alternatives.
arXiv Detail & Related papers (2023-10-09T17:59:26Z)
Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices [3.809702129519641]
New deep neural network (DNN) architectures and approaches are emerging every few years, driving the field's advancement. Transformers are a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges. This work aims to make steps towards bridging this gap by examining the current state of Transformers' on-device execution.
arXiv Detail & Related papers (2023-06-20T10:15:01Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living [25.04517296731092]
Domain shifts, such as appearance changes, are a key challenge in real-world applications of activity recognition models. We introduce an activity domain generation framework which creates novel ADL appearances from different existing activity modalities. Our framework computes human poses, heatmaps of body joints, and optical flow maps and uses them alongside the original RGB videos to learn the essence of source domains.
arXiv Detail & Related papers (2022-08-03T08:28:33Z)
Exploring Transformers for Behavioural Biometrics: A Case Study in Gait Recognition [0.7874708385247353]
This article intends to explore and propose novel gait biometric recognition systems based on Transformers. Several state-of-the-art architectures (Vanilla, Informer, Autoformer, Block-Recurrent Transformer, and THAT) are considered in the experimental framework. Experiments are carried out using the two popular public databases whuGAIT and OU-ISIR.
arXiv Detail & Related papers (2022-06-03T08:08:40Z)
UMSNet: An Universal Multi-sensor Network for Human Activity Recognition [10.952666953066542]
This paper proposes a universal multi-sensor network (UMSNet) for human activity recognition. In particular, we propose a new lightweight sensor residual block (called LSR block), which improves the performance. Our framework has a clear structure and can be directly applied to various types of multi-modal Time Series Classification tasks.
arXiv Detail & Related papers (2022-05-24T03:29:54Z)
Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers [52.30336730712544]
We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
arXiv Detail & Related papers (2022-02-01T19:03:03Z)
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP [62.401161377258234]
In this paper, we propose to jointly search the optimal combination of convolution, transformer, and COCO for building a series of all-operator network architectures. We identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when operators are combined to form a network. To better tackle the global context captured by the transformer and operators, we propose two novel context-aware down-sampling modules.
arXiv Detail & Related papers (2021-10-08T11:09:40Z)
Multi-Exit Vision Transformer for Dynamic Inference [88.17413955380262]
We propose seven different architectures for early exit branches that can be used for dynamic inference in Vision Transformer backbones. We show that each one of our proposed architectures could prove useful in the trade-off between accuracy and speed.
arXiv Detail & Related papers (2021-06-29T09:01:13Z)
Real-time Human Activity Recognition Using Conditionally Parametrized Convolutions on Mobile and Wearable Devices [14.260179062012512]
deep convolutional neural networks (CNNs) have achieved state-of-the-art performance on various HAR datasets. A high number of operations in deep leaning increases computational cost and is not suitable for real-time HAR using mobile and wearable sensors. We propose a efficient CNN using conditionally parametrized convolution for real-time HAR on mobile and wearable devices.
arXiv Detail & Related papers (2020-06-05T07:06:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.