Related papers: Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data

Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data

URL: http://arxiv.org/abs/2502.00779v1
Date: Sun, 02 Feb 2025 12:33:52 GMT
Title: Role of Mixup in Topological Persistence Based Knowledge Distillation for Wearable Sensor Data
Authors: Eun Som Jeon, Hongjun Choi, Matthew P. Buman, Pavan Turaga,
Abstract summary: We analyze the role of mixup in KD with time-series as well as topological persistence, employing multiple teachers.<n>We present a comprehensive analysis of various methods in KD and mixup on wearable sensor data.
Score: 3.6579002555961915
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The analysis of wearable sensor data has enabled many successes in several applications. To represent the high-sampling rate time-series with sufficient detail, the use of topological data analysis (TDA) has been considered, and it is found that TDA can complement other time-series features. Nonetheless, due to the large time consumption and high computational resource requirements of extracting topological features through TDA, it is difficult to deploy topological knowledge in various applications. To tackle this problem, knowledge distillation (KD) can be adopted, which is a technique facilitating model compression and transfer learning to generate a smaller model by transferring knowledge from a larger network. By leveraging multiple teachers in KD, both time-series and topological features can be transferred, and finally, a superior student using only time-series data is distilled. On the other hand, mixup has been popularly used as a robust data augmentation technique to enhance model performance during training. Mixup and KD employ similar learning strategies. In KD, the student model learns from the smoothed distribution generated by the teacher model, while mixup creates smoothed labels by blending two labels. Hence, this common smoothness serves as the connecting link that establishes a connection between these two methods. In this paper, we analyze the role of mixup in KD with time-series as well as topological persistence, employing multiple teachers. We present a comprehensive analysis of various methods in KD and mixup on wearable sensor data.

Related papers

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation [64.15918654558816]
Self-distillation (SSD) training strategy is introduced for filtering and weighting teacher representation to distill from task-relevant representations only. Experimental results on real-world affective computing, wearable/biosignal datasets from the UCR Archive, the HAR dataset, and image classification datasets show that the proposed SSD method can outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-04-19T14:08:56Z)
Faithful Label-free Knowledge Distillation [8.572967695281054]
This paper presents a label-free knowledge distillation approach called Teacher in the Middle (TinTeM) It produces a more faithful student, which better replicates the behavior of the teacher network across a range of benchmarks testing model robustness, generalisability and out-of-distribution detection.
arXiv Detail & Related papers (2024-11-22T01:48:44Z)
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data [15.326571438985466]
topological features obtained by topological data analysis (TDA) have been suggested as a potential solution. There are two significant obstacles to using topological features in deep learning. We propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. A robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features.
arXiv Detail & Related papers (2024-07-07T10:08:34Z)
Distilling Privileged Multimodal Information for Expression Recognition using Optimal Transport [46.91791643660991]
Deep learning models for multimodal expression recognition have reached remarkable performance in controlled laboratory environments. These models struggle in the wild because of the unavailability and quality of modalities used for training. In practice, only a subset of the training-time modalities may be available at test time. Learning with privileged information enables models to exploit data from additional modalities that are only available during training.
arXiv Detail & Related papers (2024-01-27T19:44:15Z)
Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning [3.1423836318272773]
Knowledge distillation (KD) improves the performance of efficient and lightweight models. Most existing KD techniques rely on Kullback-Leibler (KL) divergence. We propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning.
arXiv Detail & Related papers (2023-11-23T11:34:48Z)
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z)
Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study [4.751886527142779]
Mixup is a popular data augmentation technique based on creating new samples by linear generalization between two given data samples. Knowledge distillation (KD) is widely used for model compression and transfer learning. "smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup.
arXiv Detail & Related papers (2022-11-08T01:43:14Z)
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z)
Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN) To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model. Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z)
How and When Adversarial Robustness Transfers in Knowledge Distillation? [137.11016173468457]
This paper studies how and when the adversarial robustness can be transferred from a teacher model to a student model in Knowledge distillation (KD) We show that standard KD training fails to preserve adversarial robustness, and we propose KD with input gradient alignment (KDIGA) for remedy. Under certain assumptions, we prove that the student model using our proposed KDIGA can achieve at least the same certified robustness as the teacher model.
arXiv Detail & Related papers (2021-10-22T21:30:53Z)
Preventing Catastrophic Forgetting and Distribution Mismatch in Knowledge Distillation via Synthetic Data [5.064036314529226]
We propose a data-free KD framework that maintains a dynamic collection of generated samples over time. Our experiments demonstrate that we can improve the accuracy of the student models obtained via KD when compared with state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T08:11:08Z)
MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.