Personalizing Keyword Spotting with Speaker Information
- URL: http://arxiv.org/abs/2311.03419v1
- Date: Mon, 6 Nov 2023 12:16:06 GMT
- Title: Personalizing Keyword Spotting with Speaker Information
- Authors: Beltr\'an Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati,
Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio L\'opez Moreno
- Abstract summary: Keywords spotting systems often struggle to generalize to a diverse population with various accents and age groups.
We propose a novel approach that integrates speaker information into keyword spotting using Feature-wise Linear Modulation (FiLM)
Our proposed approach only requires a small 1% increase in the number of parameters, with a minimum impact on latency and computational cost.
- Score: 11.4457776449367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keyword spotting systems often struggle to generalize to a diverse population
with various accents and age groups. To address this challenge, we propose a
novel approach that integrates speaker information into keyword spotting using
Feature-wise Linear Modulation (FiLM), a recent method for learning from
multiple sources of information. We explore both Text-Dependent and
Text-Independent speaker recognition systems to extract speaker information,
and we experiment on extracting this information from both the input audio and
pre-enrolled user audio. We evaluate our systems on a diverse dataset and
achieve a substantial improvement in keyword detection accuracy, particularly
among underrepresented speaker groups. Moreover, our proposed approach only
requires a small 1% increase in the number of parameters, with a minimum impact
on latency and computational cost, which makes it a practical solution for
real-world applications.
Related papers
- Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives.
We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources.
We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z) - Improving Speaker Diarization using Semantic Information: Joint Pairwise
Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems.
We introduce spoken language understanding modules to extract speaker-related semantic information.
We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z) - Multi-pass Training and Cross-information Fusion for Low-resource
End-to-end Accented Speech Recognition [12.323309756880581]
Low-resource accented speech recognition is one of the important challenges faced by current ASR technology.
We propose a Conformer-based architecture, called Aformer, to leverage both the acoustic information from large non-accented and limited accented training data.
arXiv Detail & Related papers (2023-06-20T06:08:09Z) - Exploring Speaker-Related Information in Spoken Language Understanding
for Better Speaker Diarization [7.673971221635779]
We propose methods to extract speaker-related information from semantic content in multi-party meetings.
Experiments on both AISHELL-4 and AliMeeting datasets show that our method achieves consistent improvements over acoustic-only speaker diarization systems.
arXiv Detail & Related papers (2023-05-22T11:14:19Z) - Self-supervised Fine-tuning for Improved Content Representations by
Speaker-invariant Clustering [78.2927924732142]
We propose speaker-invariant clustering (Spin) as a novel self-supervised learning method.
Spin disentangles speaker information and preserves content representations with just 45 minutes of fine-tuning on a single GPU.
arXiv Detail & Related papers (2023-05-18T15:59:36Z) - Improving speaker de-identification with functional data analysis of f0
trajectories [10.809893662563926]
Formant modification is a simpler, yet effective method for speaker de-identification which requires no training data.
This study introduces a novel speaker de-identification method, which, in addition to simple formant shifts, manipulates f0 trajectories based on functional data analysis.
The proposed speaker de-identification method will conceal plausibly identifying pitch characteristics in a phonetically controllable manner and improve formant-based speaker de-identification up to 25%.
arXiv Detail & Related papers (2022-03-31T01:34:15Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Multi-talker ASR for an unknown number of sources: Joint training of
source counting, separation and ASR [91.87500543591945]
We develop an end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers.
Our experiments show very promising performance in counting accuracy, source separation and speech recognition.
Our system generalizes well to a larger number of speakers than it ever saw during training.
arXiv Detail & Related papers (2020-06-04T11:25:50Z) - Speaker Diarization with Lexical Information [59.983797884955]
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy.
arXiv Detail & Related papers (2020-04-13T17:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.