Modeling Accurate Human Activity Recognition for Embedded Devices Using
Multi-level Distillation
- URL: http://arxiv.org/abs/2107.07331v1
- Date: Tue, 6 Jul 2021 09:01:41 GMT
- Title: Modeling Accurate Human Activity Recognition for Embedded Devices Using
Multi-level Distillation
- Authors: Runze Chen and Haiyong Luo and Fang Zhao and Xuechun Meng and Zhiqing
Xie and Yida Zhu
- Abstract summary: Human activity recognition (HAR) based on IMU sensors is an essential domain in ubiquitous computing.
We propose a plug-and-play HAR modeling pipeline with multi-level distillation to build deep convolutional HAR models with native support of embedded devices.
We compare the performance of accuracy, F1 macro score, and energy cost on the embedded platform of various state-of-the-art HAR frameworks with a MobileNet V3 model built by SMLDist.
- Score: 5.746224188845082
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human activity recognition (HAR) based on IMU sensors is an essential domain
in ubiquitous computing. Because of the improving trend to deploy artificial
intelligence into IoT devices or smartphones, more researchers design the HAR
models for embedded devices. We propose a plug-and-play HAR modeling pipeline
with multi-level distillation to build deep convolutional HAR models with
native support of embedded devices. SMLDist consists of stage distillation,
memory distillation, and logits distillation, which covers all the information
flow of the deep models. Stage distillation constrains the learning direction
of the intermediate features. Memory distillation teaches the student models
how to explain and store the inner relationship between high-dimensional
features based on Hopfield networks. Logits distillation constructs distilled
logits by a smoothed conditional rule to keep the probable distribution and
improve the correctness of the soft target. We compare the performance of
accuracy, F1 macro score, and energy cost on the embedded platform of various
state-of-the-art HAR frameworks with a MobileNet V3 model built by SMLDist. The
produced model has well balance with robustness, efficiency, and accuracy.
SMLDist can also compress the models with minor performance loss in an equal
compression rate than other state-of-the-art knowledge distillation methods on
seven public datasets.
Related papers
- Smooth-Distill: A Self-distillation Framework for Multitask Learning with Wearable Sensor Data [0.0]
This paper introduces Smooth-Distill, a novel self-distillation framework designed to simultaneously perform human activity recognition (HAR) and sensor placement detection.<n>Unlike conventional distillation methods that require separate teacher and student models, the proposed framework utilizes a smoothed, historical version of the model itself as the teacher.<n> Experimental results show that Smooth-Distill consistently outperforms alternative approaches across different evaluation scenarios.
arXiv Detail & Related papers (2025-06-27T06:51:51Z) - Knowledge Distillation for Reservoir-based Classifier: Human Activity Recognition [3.5938832647391]
PatchEchoClassifier is a novel model that leverages a reservoir-based mechanism known as the Echo State Network (ESN)<n>The model is designed for human activity recognition (HAR) using one-dimensional sensor signals and incorporates a tokenizer to extract patch-level representations.<n> Experimental evaluations on multiple HAR datasets demonstrate that our model achieves over 80 percent accuracy while significantly reducing computational cost.
arXiv Detail & Related papers (2025-05-29T01:48:36Z) - Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only.
Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - SphOR: A Representation Learning Perspective on Open-set Recognition for Identifying Unknown Classes in Deep Learning Models [1.2172320168050468]
We introduce SphOR, a representation learning method that models the feature space as a mixture of von Mises-Fisher distributions.
This approach enables the use of semantically ambiguous samples during training, to improve the detection of samples from unknown classes.
arXiv Detail & Related papers (2025-03-11T05:06:11Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.
We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.
Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models [4.737806982257592]
This study proposes a knowledge distillation algorithm based on large language models and feature alignment.
The proposed model performs very close to the state-of-the-art GPT-4 model in terms of evaluation indicators such as perplexity, BLEU, ROUGE, and CER.
arXiv Detail & Related papers (2024-12-27T04:37:06Z) - MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification [19.476061046309052]
We present a scalable mixture-of-expert (MoE) based refinement of distilled large language model optimized for encrypted traffic classification.
Experiments on 10 datasets show superior or competitive performance over state-of-the-art models.
arXiv Detail & Related papers (2024-11-20T03:01:41Z) - CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination [28.061239778773423]
Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks.
CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption of computational resources.
We introduce CLIP-CID, a novel distillation mechanism that effectively transfers knowledge from a large vision-language foundation model to a smaller model.
arXiv Detail & Related papers (2024-08-18T11:23:21Z) - TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation [6.856317526681759]
Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots.
Existing methods overcome this by exploiting powerful yet large networks.
We propose a high-performance teacher and lightweight student distillation framework called TSCM.
arXiv Detail & Related papers (2024-04-02T02:29:41Z) - Learning Lightweight Object Detectors via Multi-Teacher Progressive
Distillation [56.053397775016755]
We propose a sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.
To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students.
arXiv Detail & Related papers (2023-08-17T17:17:08Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Self-Feature Regularization: Self-Feature Distillation Without Teacher
Models [0.0]
Self-Feature Regularization(SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers.
We firstly use generalization-l2 loss to match local features and a many-to-one approach to distill more intensively in the channel dimension.
arXiv Detail & Related papers (2021-03-12T15:29:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.