Distilling EEG Representations via Capsules for Affective Computing
- URL: http://arxiv.org/abs/2105.00104v1
- Date: Fri, 30 Apr 2021 22:04:35 GMT
- Title: Distilling EEG Representations via Capsules for Affective Computing
- Authors: Guangyi Zhang and Ali Etemad
- Abstract summary: We propose a novel knowledge distillation pipeline to distill EEG representations via capsule-based architectures.
Our framework consistently enables student networks with different compression ratios to effectively learn from the teacher.
Our method achieves state-of-the-art results on one of the two datasets.
- Score: 14.67085109524245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Affective computing with Electroencephalogram (EEG) is a challenging task
that requires cumbersome models to effectively learn the information contained
in large-scale EEG signals, causing difficulties for real-time smart-device
deployment. In this paper, we propose a novel knowledge distillation pipeline
to distill EEG representations via capsule-based architectures for both
classification and regression tasks. Our goal is to distill information from a
heavy model to a lightweight model for subject-specific tasks. To this end, we
first pre-train a large model (teacher network) on large number of training
samples. Then, we employ the teacher network to learn the discriminative
features embedded in capsules by adopting a lightweight model (student network)
to mimic the teacher using the privileged knowledge. Such privileged
information learned by the teacher contain similarities among capsules and are
only available during the training stage of the student network. We evaluate
the proposed architecture on two large-scale public EEG datasets, showing that
our framework consistently enables student networks with different compression
ratios to effectively learn from the teacher, even when provided with limited
training samples. Lastly, our method achieves state-of-the-art results on one
of the two datasets.
Related papers
- Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation [0.0]
This thesis aims to investigate the feasibility of knowledge transfer between neural networks for medical image segmentation tasks.
In the context of medical imaging, where the data volumes are often limited, leveraging knowledge from a larger pre-trained network could be useful.
arXiv Detail & Related papers (2024-06-05T12:06:04Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Cross Architecture Distillation for Face Recognition [49.55061794917994]
We develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge.
Experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-06-26T12:54:28Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Distilling Knowledge from Self-Supervised Teacher by Embedding Graph
Alignment [52.704331909850026]
We formulate a new knowledge distillation framework to transfer the knowledge from self-supervised pre-trained models to any other student network.
Inspired by the spirit of instance discrimination in self-supervised learning, we model the instance-instance relations by a graph formulation in the feature embedding space.
Our distillation scheme can be flexibly applied to transfer the self-supervised knowledge to enhance representation learning on various student networks.
arXiv Detail & Related papers (2022-11-23T19:27:48Z) - Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN)
To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model.
Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z) - Oracle Teacher: Leveraging Target Information for Better Knowledge
Distillation of CTC Models [10.941519846908697]
We introduce a new type of teacher model for connectionist temporal classification ( CTC)-based sequence models, namely Oracle Teacher.
Since the Oracle Teacher learns a more accurate CTC alignment by referring to the target information, it can provide the student with more optimal guidance.
Based on a many-to-one mapping property of the CTC algorithm, we present a training strategy that can effectively prevent the trivial solution.
arXiv Detail & Related papers (2021-11-05T14:14:05Z) - Dual Discriminator Adversarial Distillation for Data-free Model
Compression [36.49964835173507]
We propose Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data.
To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data.
The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data.
arXiv Detail & Related papers (2021-04-12T12:01:45Z) - BENDR: using transformers and a contrastive self-supervised learning
task to learn from massive amounts of EEG data [15.71234837305808]
We consider how to adapt techniques and architectures used for language modelling (LM) to encephalography modelling (EM)
We find that a single pre-trained model is capable of modelling completely novel raw EEG sequences recorded with differing hardware.
Both the internal representations of this model and the entire architecture can be fine-tuned to a variety of downstream BCI and EEG classification tasks.
arXiv Detail & Related papers (2021-01-28T14:54:01Z) - Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.