Cross-modal Speaker Verification and Recognition: A Multilingual
Perspective
- URL: http://arxiv.org/abs/2004.13780v2
- Date: Thu, 22 Apr 2021 15:10:21 GMT
- Title: Cross-modal Speaker Verification and Recognition: A Multilingual
Perspective
- Authors: Muhammad Saad Saeed, Shah Nawaz, Pietro Morerio, Arif Mahmood, Ignazio
Gallo, Muhammad Haroon Yousaf, and Alessio Del Bue
- Abstract summary: The aim of this paper is to answer two closely related questions: "Is face-voice association language independent?" and "Can a speaker be recognised of the spoken language?"
To answer them, we collected a Multilingual Audio-Visual dataset, containing human speech clips of $154$ identities with $3$ language annotations extracted from various videos uploaded online.
- Score: 29.314358875442778
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen a surge in finding association between faces and
voices within a cross-modal biometric application along with speaker
recognition. Inspired from this, we introduce a challenging task in
establishing association between faces and voices across multiple languages
spoken by the same set of persons. The aim of this paper is to answer two
closely related questions: "Is face-voice association language independent?"
and "Can a speaker be recognised irrespective of the spoken language?". These
two questions are very important to understand effectiveness and to boost
development of multilingual biometric systems. To answer them, we collected a
Multilingual Audio-Visual dataset, containing human speech clips of $154$
identities with $3$ language annotations extracted from various videos uploaded
online. Extensive experiments on the three splits of the proposed dataset have
been performed to investigate and answer these novel research questions that
clearly point out the relevance of the multilingual problem.
Related papers
- MulliVC: Multi-lingual Voice Conversion With Cycle Consistency [75.59590240034261]
MulliVC is a novel voice conversion system that only converts timbre and keeps original content and source language prosody without multi-lingual paired data.
Both objective and subjective results indicate that MulliVC significantly surpasses other methods in both monolingual and cross-lingual contexts.
arXiv Detail & Related papers (2024-08-08T18:12:51Z) - A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge [16.813582262700415]
The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities.
The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers.
arXiv Detail & Related papers (2024-06-22T10:49:36Z) - Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan [29.23176868272216]
The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.
This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge.
arXiv Detail & Related papers (2024-04-14T19:51:32Z) - LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild [0.0]
This paper presents a semi-automatically annotated audiovisual database to deal with unconstrained natural Spanish.
Results for both speaker-dependent and speaker-independent scenarios are reported using Hidden Markov Models.
arXiv Detail & Related papers (2023-11-21T09:12:21Z) - Multimodal Modeling For Spoken Language Identification [57.94119986116947]
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
arXiv Detail & Related papers (2023-09-19T12:21:39Z) - Multilingual Multi-Figurative Language Detection [14.799109368073548]
figurative language understanding is highly understudied in a multilingual setting.
We introduce multilingual multi-figurative language modelling, and provide a benchmark for sentence-level figurative language detection.
We develop a framework for figurative language detection based on template-based prompt learning.
arXiv Detail & Related papers (2023-05-31T18:52:41Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
Multi-Speaker Text-to-Speech [58.93395189153713]
We extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks.
We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes.
Our model shows great improvements over speaker-embedding-based multi-speaker TTS methods.
arXiv Detail & Related papers (2022-11-07T13:35:16Z) - Exploiting Spectral Augmentation for Code-Switched Spoken Language
Identification [2.064612766965483]
We perform spoken LID on three Indian languages code-mixed with English.
This task was organized by the Microsoft research team as a spoken LID challenge.
arXiv Detail & Related papers (2020-10-14T14:37:03Z) - Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking
Head Generation Using Phonetic Posteriorgrams [58.617181880383605]
In this work, we propose a novel approach using phonetic posteriorgrams.
Our method doesn't need hand-crafted features and is more robust to noise compared to recent approaches.
Our model is the first to support multilingual/mixlingual speech as input with convincing results.
arXiv Detail & Related papers (2020-06-20T16:32:43Z) - XPersona: Evaluating Multilingual Personalized Chatbot [76.00426517401894]
We propose a multi-lingual extension of Persona-Chat, namely XPersona.
Our dataset includes persona conversations in six different languages other than English for building and evaluating multilingual personalized agents.
arXiv Detail & Related papers (2020-03-17T07:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.