Unified Modeling of Multi-Domain Multi-Device ASR Systems
- URL: http://arxiv.org/abs/2205.06655v1
- Date: Fri, 13 May 2022 14:07:22 GMT
- Title: Unified Modeling of Multi-Domain Multi-Device ASR Systems
- Authors: Soumyajit Mitra, Swayambhu Nath Ray, Bharat Padi, Arunasish Sen,
Raghavendra Bilgi, Harish Arsikere, Shalini Ghosh, Ajay Srinivasamurthy, Sri
Garimella
- Abstract summary: We propose an innovative approach that integrates the different per-domain per-device models into a unified model.
Experiments show that our proposed unified modeling approach actually outperforms the carefully tuned per-domain models.
- Score: 13.61897259469694
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern Automatic Speech Recognition (ASR) systems often use a portfolio of
domain-specific models in order to get high accuracy for distinct user
utterance types across different devices. In this paper, we propose an
innovative approach that integrates the different per-domain per-device models
into a unified model, using a combination of domain embedding, domain experts,
mixture of experts and adversarial training. We run careful ablation studies to
show the benefit of each of these innovations in contributing to the accuracy
of the overall unified model. Experiments show that our proposed unified
modeling approach actually outperforms the carefully tuned per-domain models,
giving relative gains of up to 10% over a baseline model with negligible
increase in the number of parameters.
Related papers
- MoDEM: Mixture of Domain Expert Models [23.846823652305027]
We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs)
We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model.
Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size.
arXiv Detail & Related papers (2024-10-09T23:52:54Z) - GM-DF: Generalized Multi-Scenario Deepfake Detection [49.072106087564144]
Existing face forgery detection usually follows the paradigm of training models in a single domain.
In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets.
arXiv Detail & Related papers (2024-06-28T17:42:08Z) - Stochastic Adversarial Networks for Multi-Domain Text Classification [9.359945319927675]
We introduce the Adversarial Network (SAN), which innovatively models the parameters of the domain-specific feature extractor.
Our approach integrates domain label smoothing and robust pseudo-label regularization to fortify the stability of adversarial training.
The performance of our SAN, evaluated on two leading MDTC benchmarks, demonstrates its competitive edge against the current state-of-the-art methodologies.
arXiv Detail & Related papers (2024-05-28T00:02:38Z) - Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation [14.211024633768986]
The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings.
Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations.
This paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters.
arXiv Detail & Related papers (2024-04-02T22:15:48Z) - Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble [11.542472900306745]
Multi-Comprehension (MC) Ensemble is proposed as a strategy to augment the Out-of-Distribution (OOD) feature representation field.
Our experimental results demonstrate the superior performance of the MC Ensemble strategy in OOD detection.
This underscores the effectiveness of our proposed approach in enhancing the model's capability to detect instances outside its training distribution.
arXiv Detail & Related papers (2024-03-24T18:43:04Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - A Novel Mix-normalization Method for Generalizable Multi-source Person
Re-identification [49.548815417844786]
Person re-identification (Re-ID) has achieved great success in the supervised scenario.
It is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains.
We propose MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR)
arXiv Detail & Related papers (2022-01-24T18:09:38Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Multi-path Neural Networks for On-device Multi-domain Visual
Classification [55.281139434736254]
This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices.
The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space.
The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths.
arXiv Detail & Related papers (2020-10-10T05:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.