Personalized Keyword Spotting through Multi-task Learning
- URL: http://arxiv.org/abs/2206.13708v1
- Date: Tue, 28 Jun 2022 02:48:34 GMT
- Title: Personalized Keyword Spotting through Multi-task Learning
- Authors: Seunghan Yang, Byeonggeun Kim, Inseop Chung, Simyung Chang
- Abstract summary: We design two personalized KWS tasks; (1) Target user Biased KWS (TB-KWS) and (2) Target user Only KWS (TO-KWS)
To solve the tasks, we propose personalized keyword spotting through multi-task learning (PK-MTL) that consists of multi-task learning and task-adaptation.
We evaluate our framework on conventional and personalized scenarios, and the results show that PK-MTL can dramatically reduce the false alarm rate.
- Score: 6.4423565043274795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keyword spotting (KWS) plays an essential role in enabling speech-based user
interaction on smart devices, and conventional KWS (C-KWS) approaches have
concentrated on detecting user-agnostic pre-defined keywords. However, in
practice, most user interactions come from target users enrolled in the device
which motivates to construct personalized keyword spotting. We design two
personalized KWS tasks; (1) Target user Biased KWS (TB-KWS) and (2) Target user
Only KWS (TO-KWS). To solve the tasks, we propose personalized keyword spotting
through multi-task learning (PK-MTL) that consists of multi-task learning and
task-adaptation. First, we introduce applying multi-task learning on keyword
spotting and speaker verification to leverage user information to the keyword
spotting system. Next, we design task-specific scoring functions to adapt to
the personalized KWS tasks thoroughly. We evaluate our framework on
conventional and personalized scenarios, and the results show that PK-MTL can
dramatically reduce the false alarm rate, especially in various practical
scenarios.
Related papers
- TAROT: A Hierarchical Framework with Multitask Co-Pretraining on
Semi-Structured Data towards Effective Person-Job Fit [60.31175803899285]
We propose TAROT, a hierarchical multitask co-pretraining framework, to better utilize structural and semantic information for informative text embeddings.
TAROT targets semi-structured text in profiles and jobs, and it is co-pretained with multi-grained pretraining tasks to constrain the acquired semantic information at each level.
arXiv Detail & Related papers (2024-01-15T07:57:58Z) - CompoSuite: A Compositional Reinforcement Learning Benchmark [20.89464587308586]
We present CompoSuite, an open-source benchmark for compositional multi-task reinforcement learning (RL)
Each CompoSuite task requires a particular robot arm to manipulate one individual object to achieve a task objective while avoiding an obstacle.
We benchmark existing single-task, multi-task, and compositional learning algorithms on various training settings, and assess their capability to compositionally generalize to unseen tasks.
arXiv Detail & Related papers (2022-07-08T22:01:52Z) - Few-Shot Stance Detection via Target-Aware Prompt Distillation [48.40269795901453]
This paper is inspired by the potential capability of pre-trained language models (PLMs) serving as knowledge bases and few-shot learners.
PLMs can provide essential contextual information for the targets and enable few-shot learning via prompts.
Considering the crucial role of the target in stance detection task, we design target-aware prompts and propose a novel verbalizer.
arXiv Detail & Related papers (2022-06-27T12:04:14Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - On the Efficiency of Integrating Self-supervised Learning and
Meta-learning for User-defined Few-shot Keyword Spotting [51.41426141283203]
User-defined keyword spotting is a task to detect new spoken terms defined by users.
Previous works try to incorporate self-supervised learning models or apply meta-learning algorithms.
Our result shows that HuBERT combined with Matching network achieves the best result.
arXiv Detail & Related papers (2022-04-01T10:59:39Z) - Learning Decoupling Features Through Orthogonality Regularization [55.79910376189138]
Keywords spotting (KWS) and speaker verification (SV) are two important tasks in speech applications.
We develop a two-branch deep network (KWS branch and SV branch) with the same network structure.
A novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously.
arXiv Detail & Related papers (2022-03-31T03:18:13Z) - Multi-task Learning with Cross Attention for Keyword Spotting [8.103605110339519]
Keywords spotting (KWS) is an important technique for speech applications, which enables users to activate devices by speaking a keyword phrase.
There is a mismatch between the training criterion (phoneme recognition) and the target task (KWS)
Recently, multi-task learning has been applied to KWS to exploit both ASR and KWS training data.
arXiv Detail & Related papers (2021-07-15T22:38:16Z) - Teaching keyword spotters to spot new keywords with limited examples [6.251896411370577]
We present KeySEM, a speech embedding model pre-trained on the task of recognizing a large number of keywords.
KeySEM is well suited to on-device environments where post-deployment learning and ease of customization are often desirable.
arXiv Detail & Related papers (2021-06-04T12:43:36Z) - Auto-KWS 2021 Challenge: Task, Datasets, and Baselines [63.82759886293636]
Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.
The challenge focuses on the problem of customized keyword spotting, where the target device can only be awakened by an enrolled speaker with his specified keyword.
arXiv Detail & Related papers (2021-03-31T14:56:48Z) - Few-Shot Keyword Spotting With Prototypical Networks [3.6930948691311016]
keyword spotting has been widely used in many voice interfaces such as Amazon's Alexa and Google Home.
We first formulate this problem as a few-shot keyword spotting and approach it using metric learning.
We then propose a solution to the prototypical few-shot keyword spotting problem using temporal and dilated convolutions on networks.
arXiv Detail & Related papers (2020-07-25T20:17:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.