Role of Data Augmentation Strategies in Knowledge Distillation for
Wearable Sensor Data
- URL: http://arxiv.org/abs/2201.00111v1
- Date: Sat, 1 Jan 2022 04:40:14 GMT
- Title: Role of Data Augmentation Strategies in Knowledge Distillation for
Wearable Sensor Data
- Authors: Eun Som Jeon, Anirudh Som, Ankita Shukla, Kristina Hasanaj, Matthew P.
Buman, Pavan Turaga
- Abstract summary: We study the applicability and challenges of using KD for time-series data for wearable devices.
It is not yet known if there exists a coherent strategy for choosing an augmentation approach during KD.
Our study considers databases from small scale publicly available to one derived from a large scale interventional study into human activity and sedentary behavior.
- Score: 6.638638309021825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are parametrized by several thousands or millions of
parameters, and have shown tremendous success in many classification problems.
However, the large number of parameters makes it difficult to integrate these
models into edge devices such as smartphones and wearable devices. To address
this problem, knowledge distillation (KD) has been widely employed, that uses a
pre-trained high capacity network to train a much smaller network, suitable for
edge devices. In this paper, for the first time, we study the applicability and
challenges of using KD for time-series data for wearable devices. Successful
application of KD requires specific choices of data augmentation methods during
training. However, it is not yet known if there exists a coherent strategy for
choosing an augmentation approach during KD. In this paper, we report the
results of a detailed study that compares and contrasts various common choices
and some hybrid data augmentation strategies in KD based human activity
analysis. Research in this area is often limited as there are not many
comprehensive databases available in the public domain from wearable devices.
Our study considers databases from small scale publicly available to one
derived from a large scale interventional study into human activity and
sedentary behavior. We find that the choice of data augmentation techniques
during KD have a variable level of impact on end performance, and find that the
optimal network choice as well as data augmentation strategies are specific to
a dataset at hand. However, we also conclude with a general set of
recommendations that can provide a strong baseline performance across
databases.
Related papers
- Scale-up Unlearnable Examples Learning with High-Performance Computing [7.410014640563799]
Unlearnable Examples (UEs) aim to make data unlearnable to deep learning models.
We scaled up UC learning across various datasets using Distributed Data Parallel (DDP) training on the Summit supercomputer.
Our findings reveal that both overly large and overly small batch sizes can lead to performance instability and affect accuracy.
arXiv Detail & Related papers (2025-01-10T16:15:23Z) - Active Data Curation Effectively Distills Large-Scale Multimodal Models [66.23057263509027]
Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones.
In this work we explore an alternative, yet simple approach -- active data curation as effective distillation for contrastive multimodal pretraining.
Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations.
arXiv Detail & Related papers (2024-11-27T18:50:15Z) - Condensed Sample-Guided Model Inversion for Knowledge Distillation [42.91823325342862]
Knowledge distillation (KD) is a key element in neural network compression that allows knowledge transfer from a pre-trained teacher model to a more compact student model.
KD relies on access to the training dataset, which may not always be fully available due to privacy concerns or logistical issues related to the size of the data.
In this paper, we consider condensed samples as a form of supplementary information, and introduce a method for using them to better approximate the target data distribution.
arXiv Detail & Related papers (2024-08-25T14:43:27Z) - How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks [74.21484375019334]
Training deep neural networks reliably requires access to large-scale datasets.
To mitigate both the time and financial costs associated with model development, a clear understanding of the amount of data required to train a satisfactory model is crucial.
This paper proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.
arXiv Detail & Related papers (2024-04-04T13:55:06Z) - Practical Insights into Knowledge Distillation for Pre-Trained Models [6.085875355032475]
This research investigates the enhancement of knowledge distillation (KD) processes in pre-trained models.
Despite the adoption of numerous KD approaches for transferring knowledge among pre-trained models, a comprehensive understanding of KD's application is lacking.
Our study conducts an extensive comparison of multiple KD techniques, including standard KD, tuned KD (via optimized temperature and weight parameters), deep mutual learning, and data partitioning KD.
arXiv Detail & Related papers (2024-02-22T19:07:08Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Categories of Response-Based, Feature-Based, and Relation-Based
Knowledge Distillation [10.899753512019933]
Knowledge Distillation (KD) aims to optimize a lightweight network.
KD mainly involves knowledge extraction and distillation strategies.
This paper provides a comprehensive KD survey, including knowledge categories, distillation schemes and algorithms.
arXiv Detail & Related papers (2023-06-19T03:42:44Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Modality-specific Distillation [30.190082262375395]
We propose modality-specific distillation (MSD) to effectively transfer knowledge from a teacher on multimodal datasets.
Our idea aims at mimicking a teacher's modality-specific predictions by introducing an auxiliary loss term for each modality.
Because each modality has different importance for predictions, we also propose weighting approaches for the auxiliary losses.
arXiv Detail & Related papers (2021-01-06T05:45:07Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.