Application-Agnostic Language Modeling for On-Device ASR
- URL: http://arxiv.org/abs/2305.09764v1
- Date: Tue, 16 May 2023 19:31:18 GMT
- Title: Application-Agnostic Language Modeling for On-Device ASR
- Authors: Markus Nu{\ss}baum-Thom, Lyan Verwimp, Youssef Oualil
- Abstract summary: On-device automatic speech recognition systems face several challenges compared to server-based systems.
They have to meet stricter constraints in terms of speed, disk size and memory.
One of our novel approaches reduces the disk size by half, while maintaining speed and accuracy of the original model.
- Score: 6.03523493247947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device automatic speech recognition systems face several challenges
compared to server-based systems. They have to meet stricter constraints in
terms of speed, disk size and memory while maintaining the same accuracy. Often
they have to serve several applications with different distributions at once,
such as communicating with a virtual assistant and speech-to-text. The simplest
solution to serve multiple applications is to build application-specific
(language) models, but this leads to an increase in memory. Therefore, we
explore different data- and architecture-driven language modeling approaches to
build a single application-agnostic model. We propose two novel feed-forward
architectures that find an optimal trade off between different on-device
constraints. In comparison to the application-specific solution, one of our
novel approaches reduces the disk size by half, while maintaining speed and
accuracy of the original model.
Related papers
- Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients [12.008071873475169]
Federated learning is a technique that collaboratively learns a shared prediction model while keeping the data local on different clients.
We propose using dynamical architectures which, employing early-exit solutions, can adapt their processing depending on the input and on the operation conditions.
This solution falls in the realm of partial training methods and brings two benefits: a single model is used on a variety of devices; federating the models after local training is straightforward.
arXiv Detail & Related papers (2024-05-27T17:32:37Z) - Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices [28.06179341376626]
We introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation.
Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch.
arXiv Detail & Related papers (2024-04-24T01:31:39Z) - Post-Training Embedding Alignment for Decoupling Enrollment and Runtime
Speaker Recognition Models [18.50444234955465]
We propose a lightweight neural network to map the embeddings from two independent models to a shared speaker embedding space.
Our results show that this approach significantly outperforms cosine scoring in a shared speaker logit space for models that were trained with a contrastive loss on large datasets with many speaker identities.
arXiv Detail & Related papers (2024-01-23T02:19:31Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language
and Accent Identification [0.013428344011390777]
We propose a new approach to enable multilingual speech recognition on edge devices.
This approach uses both language identification and accent identification to select one of multiple monolingual ASR models on-the-fly.
Initial results for both recognition performance and resource usage are promising with our approach using less than 1/12th of the memory consumed by other solutions.
arXiv Detail & Related papers (2021-08-04T12:59:53Z) - A baseline model for computationally inexpensive speech recognition for
Kazakh using the Coqui STT framework [0.0]
We train a new baseline acoustic model and three language models for use with the Coqui STT framework.
Results look promising, but further epochs of training and parameter sweeping are needed to reach a production-level accuracy.
arXiv Detail & Related papers (2021-07-19T14:17:42Z) - NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural
Architecture Search [100.71365025972258]
We propose NAS-BERT, an efficient method for BERT compression.
NAS-BERT trains a big supernet on a search space and outputs multiple compressed models with adaptive sizes and latency.
Experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches.
arXiv Detail & Related papers (2021-05-30T07:20:27Z) - Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages.
We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z) - LightSpeech: Lightweight and Fast Text to Speech with Neural
Architecture Search [127.56834100382878]
We propose LightSpeech to automatically design more lightweight and efficient TTS models based on FastSpeech.
Experiments show that the model discovered by our method achieves 15x model compression ratio and 6.5x inference speedup on CPU with on par voice quality.
arXiv Detail & Related papers (2021-02-08T07:45:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.