Related papers: Application-Agnostic Language Modeling for On-Device ASR

Application-Agnostic Language Modeling for On-Device ASR

URL: http://arxiv.org/abs/2305.09764v1
Date: Tue, 16 May 2023 19:31:18 GMT
Title: Application-Agnostic Language Modeling for On-Device ASR
Authors: Markus Nu{\ss}baum-Thom, Lyan Verwimp, Youssef Oualil
Abstract summary: On-device automatic speech recognition systems face several challenges compared to server-based systems. They have to meet stricter constraints in terms of speed, disk size and memory. One of our novel approaches reduces the disk size by half, while maintaining speed and accuracy of the original model.
Score: 6.03523493247947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: On-device automatic speech recognition systems face several challenges compared to server-based systems. They have to meet stricter constraints in terms of speed, disk size and memory while maintaining the same accuracy. Often they have to serve several applications with different distributions at once, such as communicating with a virtual assistant and speech-to-text. The simplest solution to serve multiple applications is to build application-specific (language) models, but this leads to an increase in memory. Therefore, we explore different data- and architecture-driven language modeling approaches to build a single application-agnostic model. We propose two novel feed-forward architectures that find an optimal trade off between different on-device constraints. In comparison to the application-specific solution, one of our novel approaches reduces the disk size by half, while maintaining speed and accuracy of the original model.

Related papers

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model [76.06585781346601]
Speech language models (Speech LMs) enable end-to-end speech-text modelling within a single model.<n>The choice of speech-text jointly decoding paradigm plays a critical role in performance, efficiency, and alignment quality.
arXiv Detail & Related papers (2025-06-04T23:53:49Z)
SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation [10.828717295018123]
We propose a unified embedding framework that eliminates the need for intermediate text representations. Our model reduces pipeline latency by 50% while achieving higher retrieval accuracy compared to traditional two-stage methods.
arXiv Detail & Related papers (2025-01-26T15:04:02Z)
Mixture of Attentions For Speculative Decoding [17.344416130742232]
Speculative decoding (SD) leverages smaller models to efficiently propose future tokens, which are then verified by the Large Language Models in parallel. We identify several limitations of SD models including the lack of on-policyness during training and partial observability. We propose a more grounded architecture for small models by introducing a Mixture of Attentions for SD.
arXiv Detail & Related papers (2024-10-04T10:25:52Z)
FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers [55.2480439325792]
We propose FUSE, an approach to approximating an adapter layer that maps from one model's textual embedding space to another, even across different tokenizers. We show the efficacy of our approach via multi-objective optimization over vision-language and causal language models for image captioning and sentiment-based image captioning.
arXiv Detail & Related papers (2024-08-09T02:16:37Z)
Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients [12.008071873475169]
Federated learning is a technique that collaboratively learns a shared prediction model while keeping the data local on different clients. We propose using dynamical architectures which, employing early-exit solutions, can adapt their processing depending on the input and on the operation conditions. This solution falls in the realm of partial training methods and brings two benefits: a single model is used on a variety of devices; federating the models after local training is straightforward.
arXiv Detail & Related papers (2024-05-27T17:32:37Z)
Gated Low-rank Adaptation for personalized Code-Switching Automatic Speech Recognition on the low-spec devices [28.06179341376626]
We introduce a gated low-rank adaptation(GLoRA) for parameter-efficient fine-tuning with minimal performance degradation. Our experiments, conducted on Korean-English code-switching datasets, demonstrate that fine-tuning speech recognition models for code-switching surpasses the performance of traditional code-switching speech recognition models trained from scratch.
arXiv Detail & Related papers (2024-04-24T01:31:39Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications. On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation. Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z)
Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification [0.013428344011390777]
We propose a new approach to enable multilingual speech recognition on edge devices. This approach uses both language identification and accent identification to select one of multiple monolingual ASR models on-the-fly. Initial results for both recognition performance and resource usage are promising with our approach using less than 1/12th of the memory consumed by other solutions.
arXiv Detail & Related papers (2021-08-04T12:59:53Z)
A baseline model for computationally inexpensive speech recognition for Kazakh using the Coqui STT framework [0.0]
We train a new baseline acoustic model and three language models for use with the Coqui STT framework. Results look promising, but further epochs of training and parameter sweeping are needed to reach a production-level accuracy.
arXiv Detail & Related papers (2021-07-19T14:17:42Z)
NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search [100.71365025972258]
We propose NAS-BERT, an efficient method for BERT compression. NAS-BERT trains a big supernet on a search space and outputs multiple compressed models with adaptive sizes and latency. Experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches.
arXiv Detail & Related papers (2021-05-30T07:20:27Z)
Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages. We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z)
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search [127.56834100382878]
We propose LightSpeech to automatically design more lightweight and efficient TTS models based on FastSpeech. Experiments show that the model discovered by our method achieves 15x model compression ratio and 6.5x inference speedup on CPU with on par voice quality.
arXiv Detail & Related papers (2021-02-08T07:45:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.