Server-side Rescoring of Spoken Entity-centric Knowledge Queries for
Virtual Assistants
- URL: http://arxiv.org/abs/2311.01398v1
- Date: Thu, 2 Nov 2023 17:07:23 GMT
- Title: Server-side Rescoring of Spoken Entity-centric Knowledge Queries for
Virtual Assistants
- Authors: Youyuan Zhang, Sashank Gondala, Thiago Fraga-Silva, Christophe Van
Gysel
- Abstract summary: We conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries.
We demonstrate significant WER improvements of 23%-35% on various entity-centric query subpopulations.
We also show that model fusion of multiple server-side LMs trained from scratch most effectively combines complementary strengths of each model.
- Score: 5.996525771249284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: On-device Virtual Assistants (VAs) powered by Automatic Speech Recognition
(ASR) require effective knowledge integration for the challenging entity-rich
query recognition. In this paper, we conduct an empirical study of modeling
strategies for server-side rescoring of spoken information domain queries using
various categories of Language Models (LMs) (N-gram word LMs, sub-word neural
LMs). We investigate the combination of on-device and server-side signals, and
demonstrate significant WER improvements of 23%-35% on various entity-centric
query subpopulations by integrating various server-side LMs compared to
performing ASR on-device only. We also perform a comparison between LMs trained
on domain data and a GPT-3 variant offered by OpenAI as a baseline.
Furthermore, we also show that model fusion of multiple server-side LMs trained
from scratch most effectively combines complementary strengths of each model
and integrates knowledge learned from domain-specific data to a VA ASR system.
Related papers
- Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets [22.29915616018026]
Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks.
Our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules.
We introduce a three-stage training approach, expressly developed to enhance the model's ability to align auditory and textual information.
arXiv Detail & Related papers (2024-05-03T14:35:58Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z) - A Reference-less Quality Metric for Automatic Speech Recognition via
Contrastive-Learning of a Multi-Language Model with Self-Supervision [0.20999222360659603]
This work proposes a referenceless quality metric, which allows comparing the performance of different ASR models on a speech dataset without ground truth transcriptions.
To estimate the quality of ASR hypotheses, a pre-trained language model (LM) is fine-tuned with contrastive learning in a self-supervised learning manner.
The proposed referenceless metric obtains a much higher correlation with WER scores and their ranks than the perplexity metric from the state-of-art multi-lingual LM in all experiments.
arXiv Detail & Related papers (2023-06-21T21:33:39Z) - LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.
Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs.
We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - Integrating Categorical Features in End-to-End ASR [1.332560004325655]
All-neural, end-to-end ASR systems convert speech input to text units using a single trainable neural network model.
E2E models require large amounts of paired speech text data that is expensive to obtain.
We propose a simple yet effective way to integrate categorical features into E2E model.
arXiv Detail & Related papers (2021-10-06T20:07:53Z) - Multimodal Federated Learning [9.081857621783811]
In many applications, such as smart homes with IoT devices, local data on clients are generated from different modalities.
Existing federated learning systems only work on local data from a single modality, which limits the scalability of the systems.
We propose a multimodal and semi-supervised federated learning framework that trains autoencoders to extract shared or correlated representations from different local data modalities on clients.
arXiv Detail & Related papers (2021-09-10T12:32:46Z) - Arabic Code-Switching Speech Recognition using Monolingual Data [13.513655231184261]
Code-switching in automatic speech recognition (ASR) is an important challenge due to globalization.
Recent research in multilingual ASR shows potential improvement over monolingual systems.
We study key issues related to multilingual modeling for ASR through a series of large-scale ASR experiments.
arXiv Detail & Related papers (2021-07-04T08:40:49Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.