Towards Generalizable Human Activity Recognition: A Survey
- URL: http://arxiv.org/abs/2508.12213v1
- Date: Sun, 17 Aug 2025 03:04:39 GMT
- Title: Towards Generalizable Human Activity Recognition: A Survey
- Authors: Yize Cai, Baoshen Guo, Flora Salim, Zhiqing Hong,
- Abstract summary: IMU-based Human Activity Recognition (HAR) has attracted increasing attention from both academia and industry in recent years.<n>HAR performance has improved considerably in specific scenarios, but its generalization capability remains a key barrier to widespread real-world adoption.<n>In this survey, we explore the rapidly evolving field of IMU-based generalizable HAR, reviewing 229 research papers alongside 25 publicly available datasets.
- Score: 4.08377734173712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a critical component of Wearable AI, IMU-based Human Activity Recognition (HAR) has attracted increasing attention from both academia and industry in recent years. Although HAR performance has improved considerably in specific scenarios, its generalization capability remains a key barrier to widespread real-world adoption. For example, domain shifts caused by variations in users, sensor positions, or environments can significantly decrease the performance in practice. As a result, in this survey, we explore the rapidly evolving field of IMU-based generalizable HAR, reviewing 229 research papers alongside 25 publicly available datasets to provide a broad and insightful overview. We first present the background and overall framework of IMU-based HAR tasks, as well as the generalization-oriented training settings. Then, we categorize representative methodologies from two perspectives: (i) model-centric approaches, including pre-training method, end-to-end method, and large language model (LLM)-based learning method; and (ii) data-centric approaches, including multi-modal learning and data augmentation techniques. In addition, we summarize widely used datasets in this field, as well as relevant tools and benchmarks. Building on these methodological advances, the broad applicability of IMU-based HAR is also reviewed and discussed. Finally, we discuss persistent challenges (e.g., data scarcity, efficient training, and reliable evaluation) and also outline future directions for HAR, including the adoption of foundation and large language models, physics-informed and context-aware reasoning, generative modeling, and resource-efficient training and inference. The complete list of this survey is available at https://github.com/rh20624/Awesome-IMU-Sensing, which will be updated continuously.
Related papers
- On-device Large Multi-modal Agent for Human Activity Recognition [1.9342524451932614]
Human Activity Recognition (HAR) has been an active area of research, with applications ranging from healthcare to smart environments.<n>Recent advancements in Large Language Models (LLMs) have opened new possibilities to leverage their capabilities in HAR.<n>We present a Large Multi-Modal Agent designed for HAR, which integrates the power of LLMs to enhance both performance and user engagement.
arXiv Detail & Related papers (2025-12-17T22:05:05Z) - Deepfake Detection that Generalizes Across Benchmarks [63.29485283822232]
This work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of a pre-trained CLIP vision encoder.<n>We conducted an extensive evaluation on 13 benchmark datasets spanning from 2019 to 2025.<n>The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC.
arXiv Detail & Related papers (2025-08-08T12:03:56Z) - Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning [3.177649348456073]
Human Activity Recognition (HAR) based on wearable inertial sensors plays a critical role in remote health monitoring.<n>We propose a new cross-modal self-supervised pretraining approach to learn representations from large-sale unlabeled IMU-video data.<n>Our results indicate that the proposed cross-modal pretraining approach outperforms the current state-of-the-art IMU-video pretraining approach.
arXiv Detail & Related papers (2025-07-17T18:47:46Z) - Towards Modality Generalization: A Benchmark and Prospective Analysis [68.20973671493203]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities.<n>We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization.<n>Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z) - Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.<n> EM-MIA achieves state-of-the-art results on WikiMIA.
arXiv Detail & Related papers (2024-10-10T03:31:16Z) - A Controlled Study on Long Context Extension and Generalization in LLMs [85.4758128256142]
Broad textual understanding and in-context learning require language models that utilize full document contexts.
Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts.
We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data.
arXiv Detail & Related papers (2024-09-18T17:53:17Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - Standardizing Your Training Process for Human Activity Recognition
Models: A Comprehensive Review in the Tunable Factors [4.199844472131922]
We provide an exhaustive review of contemporary deep learning research in the field of wearable human activity recognition (WHAR)
Our findings suggest that a major trend is the lack of detail provided by model training protocols.
With insights from the analyses, we define a novel integrated training procedure tailored to the WHAR model.
arXiv Detail & Related papers (2024-01-10T17:45:28Z) - Temporal Action Localization for Inertial-based Human Activity Recognition [9.948823510429902]
Video-based Human Activity Recognition (TAL) has followed a segment-based prediction approach, localizing activity segments in a timeline of arbitrary length.
This paper is the first to systematically demonstrate the applicability of state-of-the-art TAL models for both offline and near-online Human Activity Recognition (HAR)
We show that by analyzing timelines as a whole, TAL models can produce more coherent segments and achieve higher NULL-class accuracy across all datasets.
arXiv Detail & Related papers (2023-11-27T13:55:21Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Invariant Feature Learning for Sensor-based Human Activity Recognition [11.334750079923428]
We present an invariant feature learning framework (IFLF) that extracts common information shared across subjects and devices.
Experiments demonstrated that IFLF is effective in handling both subject and device diversion across popular open datasets and an in-house dataset.
arXiv Detail & Related papers (2020-12-14T21:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.