On-Device Constrained Self-Supervised Speech Representation Learning for
Keyword Spotting via Knowledge Distillation
- URL: http://arxiv.org/abs/2307.02720v1
- Date: Thu, 6 Jul 2023 02:03:31 GMT
- Title: On-Device Constrained Self-Supervised Speech Representation Learning for
Keyword Spotting via Knowledge Distillation
- Authors: Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu
- Abstract summary: We propose a knowledge distillation-based self-supervised speech representation learning architecture for on-device keyword spotting.
Our approach used a teacher-student framework to transfer knowledge from a larger, more complex model to a smaller, light-weight model.
We evaluated our model's performance on an Alexa keyword spotting detection task using a 16.6k-hour in-house dataset.
- Score: 13.08005728839078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large self-supervised models are effective feature extractors, but their
application is challenging under on-device budget constraints and biased
dataset collection, especially in keyword spotting. To address this, we
proposed a knowledge distillation-based self-supervised speech representation
learning (S3RL) architecture for on-device keyword spotting. Our approach used
a teacher-student framework to transfer knowledge from a larger, more complex
model to a smaller, light-weight model using dual-view cross-correlation
distillation and the teacher's codebook as learning objectives. We evaluated
our model's performance on an Alexa keyword spotting detection task using a
16.6k-hour in-house dataset. Our technique showed exceptional performance in
normal and noisy conditions, demonstrating the efficacy of knowledge
distillation methods in constructing self-supervised models for keyword
spotting tasks while working within on-device resource constraints.
Related papers
- Generative Model-based Feature Knowledge Distillation for Action
Recognition [11.31068233536815]
Our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model.
The efficacy of our approach is demonstrated through comprehensive experiments on diverse popular datasets.
arXiv Detail & Related papers (2023-12-14T03:55:29Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Anomaly Detection in Video via Self-Supervised and Multi-Task Learning [113.81927544121625]
Anomaly detection in video is a challenging computer vision problem.
In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level.
arXiv Detail & Related papers (2020-11-15T10:21:28Z) - Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a
First-person Simulated 3D Environment [73.9469267445146]
First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor pose significant sample-efficiency challenges for reinforcement learning agents.
We show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task.
arXiv Detail & Related papers (2020-10-28T19:27:26Z) - Empowering Knowledge Distillation via Open Set Recognition for Robust 3D
Point Cloud Classification [20.591508284285368]
We propose a joint Knowledge Distillation and Open Set recognition training methodology for three-dimensional object recognition.
We demonstrate the effectiveness of the proposed method via various experiments on how it allows us to obtain a much smaller model.
arXiv Detail & Related papers (2020-10-25T13:26:48Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Reducing Overlearning through Disentangled Representations by
Suppressing Unknown Tasks [8.517620051440005]
Existing deep learning approaches for learning visual features tend to overlearn and extract more information than what is required for the task at hand.
From a privacy preservation perspective, the input visual information is not protected from the model.
We propose a model-agnostic solution for reducing model overlearning by suppressing all the unknown tasks.
arXiv Detail & Related papers (2020-05-20T17:31:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.