Related papers: Transformer-based Models to Deal with Heterogeneous Environments in Human Activity Recognition

Transformer-based Models to Deal with Heterogeneous Environments in Human Activity Recognition

URL: http://arxiv.org/abs/2209.11750v2
Date: Sat, 23 Aug 2025 20:07:17 GMT
Title: Transformer-based Models to Deal with Heterogeneous Environments in Human Activity Recognition
Authors: Sannara EK, François Portet, Philippe Lalanda,
Abstract summary: Human Activity Recognition (HAR) on mobile devices has been demonstrated to be possible using neural models trained on data collected from the device's inertial measurement units.<n>These models have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), Transformers or a combination of these to achieve state-of-the-art results with real-time performance.<n>This paper highlights the issue of data heterogeneity in machine learning applications and how it can hinder their deployment in pervasive settings.<n>We propose and publicly release the code of two sensor-wise Transformer architectures called HART and MobileHART for Human Activity Recognition Transformer
Score: 2.8381580557475963
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human Activity Recognition (HAR) on mobile devices has been demonstrated to be possible using neural models trained on data collected from the device's inertial measurement units. These models have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), Transformers or a combination of these to achieve state-of-the-art results with real-time performance. However, these approaches have not been extensively evaluated in real-world situations where the input data may be different from the training data. This paper highlights the issue of data heterogeneity in machine learning applications and how it can hinder their deployment in pervasive settings. To address this problem, we propose and publicly release the code of two sensor-wise Transformer architectures called HART and MobileHART for Human Activity Recognition Transformer. Our experiments on several publicly available datasets show that these HART architectures outperform previous architectures with fewer floating point operations and parameters than conventional Transformers. The results also show they are more robust to changes in mobile position or device brand and hence better suited for the heterogeneous environments encountered in real-life settings. Finally, the source code has been made publicly available.

Related papers

Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data [53.040873127309766]
We propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning.<n>Our method outperforms existing models on both in-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2025-09-08T17:58:06Z)
Set Transformer Architectures and Synthetic Data Generation for Flow-Guided Nanoscale Localization [13.521075124606973]
Flow-guided localization (FGL) enables the identification of spatial regions within the human body that contain an event of diagnostic interest.<n>Existing FGL solutions rely on graph models with fixed topologies or handcrafted features, which limit their adaptability to anatomical variability and hinder scalability.<n>Our formulation treats nanodevices' circulation time reports as unordered sets, enabling permutation-invariant, variable-length input processing without relying on spatial priors.
arXiv Detail & Related papers (2025-08-22T08:22:25Z)
SETransformer: A Hybrid Attention-Based Architecture for Robust Human Activity Recognition [7.291558599547268]
Human Activity Recognition (HAR) using wearable sensor data has become a central task in mobile computing, healthcare, and human-computer interaction.<n>We propose SETransformer, a hybrid deep neural architecture that combines Transformer-based temporal modeling with channel-wise squeeze-and-excitation (SE) attention and a learnable temporal attention pooling mechanism.<n>We evaluate SETransformer on the WISDM dataset and demonstrate that it significantly outperforms conventional models including LSTM, GRU, BiLSTM, and CNN baselines.
arXiv Detail & Related papers (2025-05-25T23:39:34Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Transformer-Based Approaches for Sensor-Based Human Activity Recognition: Opportunities and Challenges [0.5983301154764783]
Transformers have excelled in natural language processing and computer vision, paving their way to sensor-based Human Activity Recognition (HAR) Previous studies show that transformers outperform their counterparts exclusively when they harness abundant data or employ compute-intensive optimization algorithms. However, neither of these scenarios is viable in sensor-based HAR due to the scarcity of data in this field and the frequent need to perform training and inference on resource-constrained devices.
arXiv Detail & Related papers (2024-10-17T14:39:55Z)
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [59.193626019860226]
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. We introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers. We show that CAS-ViT achieves a competitive performance when compared to other state-of-the-art backbones.
arXiv Detail & Related papers (2024-08-07T11:33:46Z)
SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation [49.65221743520028]
We show that shifting the multiscale inductive bias into the attention mechanism can work well, resulting in a plain detector SimPLR' We find through our experiments that SimPLR with scale-aware attention is plain and simple architecture, yet competitive with multi-scale vision transformer alternatives.
arXiv Detail & Related papers (2023-10-09T17:59:26Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices [3.809702129519641]
New deep neural network (DNN) architectures and approaches are emerging every few years, driving the field's advancement. Transformers are a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges. This work aims to make steps towards bridging this gap by examining the current state of Transformers' on-device execution.
arXiv Detail & Related papers (2023-06-20T10:15:01Z)
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data. transformer models excel in handling long dependencies between input sequence elements and enable parallel processing. Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living [25.04517296731092]
Domain shifts, such as appearance changes, are a key challenge in real-world applications of activity recognition models. We introduce an activity domain generation framework which creates novel ADL appearances from different existing activity modalities. Our framework computes human poses, heatmaps of body joints, and optical flow maps and uses them alongside the original RGB videos to learn the essence of source domains.
arXiv Detail & Related papers (2022-08-03T08:28:33Z)
Exploring Transformers for Behavioural Biometrics: A Case Study in Gait Recognition [0.7874708385247353]
This article intends to explore and propose novel gait biometric recognition systems based on Transformers. Several state-of-the-art architectures (Vanilla, Informer, Autoformer, Block-Recurrent Transformer, and THAT) are considered in the experimental framework. Experiments are carried out using the two popular public databases whuGAIT and OU-ISIR.
arXiv Detail & Related papers (2022-06-03T08:08:40Z)
UMSNet: An Universal Multi-sensor Network for Human Activity Recognition [10.952666953066542]
This paper proposes a universal multi-sensor network (UMSNet) for human activity recognition. In particular, we propose a new lightweight sensor residual block (called LSR block), which improves the performance. Our framework has a clear structure and can be directly applied to various types of multi-modal Time Series Classification tasks.
arXiv Detail & Related papers (2022-05-24T03:29:54Z)
Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows [57.00864538284686]
Iwin Transformer is a hierarchical Transformer which progressively performs token representation learning and token agglomeration within irregular windows. The effectiveness and efficiency of Iwin Transformer are verified on the two standard HOI detection benchmark datasets.
arXiv Detail & Related papers (2022-03-20T12:04:50Z)
Improving Sample Efficiency of Value Based Models Using Attention and Vision Transformers [52.30336730712544]
We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
arXiv Detail & Related papers (2022-02-01T19:03:03Z)
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP [62.401161377258234]
In this paper, we propose to jointly search the optimal combination of convolution, transformer, and COCO for building a series of all-operator network architectures. We identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when operators are combined to form a network. To better tackle the global context captured by the transformer and operators, we propose two novel context-aware down-sampling modules.
arXiv Detail & Related papers (2021-10-08T11:09:40Z)
Multi-Exit Vision Transformer for Dynamic Inference [88.17413955380262]
We propose seven different architectures for early exit branches that can be used for dynamic inference in Vision Transformer backbones. We show that each one of our proposed architectures could prove useful in the trade-off between accuracy and speed.
arXiv Detail & Related papers (2021-06-29T09:01:13Z)
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning [53.73083199055093]
We show that attention-based architectures (e.g., Transformers) are fairly robust to distribution shifts. Our experiments show that replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices.
arXiv Detail & Related papers (2021-06-10T21:04:18Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
Description of Structural Biases and Associated Data in Sensor-Rich Environments [6.548580592686077]
We study activity recognition in the context of sensor-rich environments. We address the problem of inductive biases and their impact on the data collection process. We propose a metamodeling process in which the sensor data is structured in layers.
arXiv Detail & Related papers (2021-04-11T00:26:59Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers. This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
Real-time Human Activity Recognition Using Conditionally Parametrized Convolutions on Mobile and Wearable Devices [14.260179062012512]
deep convolutional neural networks (CNNs) have achieved state-of-the-art performance on various HAR datasets. A high number of operations in deep leaning increases computational cost and is not suitable for real-time HAR using mobile and wearable sensors. We propose a efficient CNN using conditionally parametrized convolution for real-time HAR on mobile and wearable devices.
arXiv Detail & Related papers (2020-06-05T07:06:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.