Incremental Layer-wise Self-Supervised Learning for Efficient Speech
Domain Adaptation On Device
- URL: http://arxiv.org/abs/2110.00155v1
- Date: Fri, 1 Oct 2021 01:22:38 GMT
- Title: Incremental Layer-wise Self-Supervised Learning for Efficient Speech
Domain Adaptation On Device
- Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya
Misra, Nikhil Siddhartha, Trevor Strohman, Fran\c{c}oise Beaufays
- Abstract summary: We propose an incremental layer-wise self-supervised learning algorithm for efficient speech domain adaptation on mobile devices.
The proposed algorithm obtains a Word Error Rate (WER) on the target domain $24.2%$ better than supervised baseline and costs $89.7%$ less training memory than the end-to-end self-supervised learning algorithm.
- Score: 24.21909388395124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Streaming end-to-end speech recognition models have been widely applied to
mobile devices and show significant improvement in efficiency. These models are
typically trained on the server using transcribed speech data. However, the
server data distribution can be very different from the data distribution on
user devices, which could affect the model performance. There are two main
challenges for on device training, limited reliable labels and limited training
memory. While self-supervised learning algorithms can mitigate the mismatch
between domains using unlabeled data, they are not applicable on mobile devices
directly because of the memory constraint. In this paper, we propose an
incremental layer-wise self-supervised learning algorithm for efficient speech
domain adaptation on mobile devices, in which only one layer is updated at a
time. Extensive experimental results demonstrate that the proposed algorithm
obtains a Word Error Rate (WER) on the target domain $24.2\%$ better than
supervised baseline and costs $89.7\%$ less training memory than the end-to-end
self-supervised learning algorithm.
Related papers
- A Continual and Incremental Learning Approach for TinyML On-device Training Using Dataset Distillation and Model Size Adaption [0.4345992906143838]
A new algorithm for incremental learning in the context of Tiny Machine learning (TinyML) is presented.
It is optimized for low-performance and energy efficient embedded devices.
Results show that the proposed algorithm offers a promising approach for TinyML incremental learning on embedded devices.
arXiv Detail & Related papers (2024-09-11T09:02:33Z) - Embedded Named Entity Recognition using Probing Classifiers [10.573861741540853]
EMBER enables streaming named entity recognition in decoder-only language models without fine-tuning them.
We show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1%.
We make our code and data available online, including a toolkit for training, testing, and deploying efficient token classification models.
arXiv Detail & Related papers (2024-03-18T12:58:16Z) - Layer Attack Unlearning: Fast and Accurate Machine Unlearning via Layer
Level Attack and Knowledge Distillation [21.587358050012032]
We propose a fast and novel machine unlearning paradigm at the layer level called layer attack unlearning.
In this work, we introduce the Partial-PGD algorithm to locate the samples to forget efficiently.
We also use Knowledge Distillation (KD) to reliably learn the decision boundaries from the teacher.
arXiv Detail & Related papers (2023-12-28T04:38:06Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Dual Learning for Large Vocabulary On-Device ASR [64.10124092250128]
Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once.
We provide an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
arXiv Detail & Related papers (2023-01-11T06:32:28Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Learning Phone Recognition from Unpaired Audio and Phone Sequences Based
on Generative Adversarial Network [58.82343017711883]
This paper investigates how to learn directly from unpaired phone sequences and speech utterances.
GAN training is adopted in the first stage to find the mapping relationship between unpaired speech and phone sequence.
In the second stage, another HMM model is introduced to train from the generator's output, which boosts the performance.
arXiv Detail & Related papers (2022-07-29T09:29:28Z) - Broadcasted Residual Learning for Efficient Keyword Spotting [7.335747584353902]
We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load.
We also propose a novel network architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted residual learning.
BC-ResNets achieve state-of-the-art 98.0% and 98.7% top-1 accuracy on Google speech command datasets v1 and v2, respectively.
arXiv Detail & Related papers (2021-06-08T06:55:39Z) - Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain
Detection [60.88952532574564]
This paper conducts a thorough comparison of out-of-domain intent detection methods.
We evaluate multiple contextual encoders and methods, proven to be efficient, on three standard datasets for intent classification.
Our main findings show that fine-tuning Transformer-based encoders on in-domain data leads to superior results.
arXiv Detail & Related papers (2021-01-11T09:10:58Z) - $DA^3$: Deep Additive Attention Adaption for Memory-Efficient On-Device
Multi-Domain Learning [30.53018068935323]
Large memory used for activation storage is the bottleneck that largely limits the training time and cost on edge devices.
We propose Deep Additive Attention Adaption, a novel memory-efficient on-device multi-domain learning method.
We validate $DA3$ on multiple datasets against state-of-the-art methods, which shows good improvement in both accuracy and training time.
arXiv Detail & Related papers (2020-12-02T18:03:18Z) - Understanding Self-Training for Gradual Domain Adaptation [107.37869221297687]
We consider gradual domain adaptation, where the goal is to adapt an initial classifier trained on a source domain given only unlabeled data that shifts gradually in distribution towards a target domain.
We prove the first non-vacuous upper bound on the error of self-training with gradual shifts, under settings where directly adapting to the target domain can result in unbounded error.
The theoretical analysis leads to algorithmic insights, highlighting that regularization and label sharpening are essential even when we have infinite data, and suggesting that self-training works particularly well for shifts with small Wasserstein-infinity distance.
arXiv Detail & Related papers (2020-02-26T08:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.