Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and
LAnguage in Conversational Environments
- URL: http://arxiv.org/abs/2311.12564v3
- Date: Wed, 3 Jan 2024 05:57:32 GMT
- Title: Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and
LAnguage in Conversational Environments
- Authors: Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri,
Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy
- Abstract summary: In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages.
Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers.
The DISPLACE challenge constitutes an open-call for evaluating and bench-marking the speaker and language diarization technologies on this challenging condition.
- Score: 28.618333018398122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multi-lingual societies, where multiple languages are spoken in a small
geographic vicinity, informal conversations often involve mix of languages.
Existing speech technologies may be inefficient in extracting information from
such conversations, where the speech data is rich in diversity with multiple
languages and speakers. The DISPLACE (DIarization of SPeaker and LAnguage in
Conversational Environments) challenge constitutes an open-call for evaluating
and bench-marking the speaker and language diarization technologies on this
challenging condition. The challenge entailed two tracks: Track-1 focused on
speaker diarization (SD) in multilingual situations while, Track-2 addressed
the language diarization (LD) in a multi-speaker scenario. Both the tracks were
evaluated using the same underlying audio data. To facilitate this evaluation,
a real-world dataset featuring multilingual, multi-speaker conversational
far-field speech was recorded and distributed. Furthermore, a baseline system
was made available for both SD and LD task which mimicked the state-of-art in
these tasks. The challenge garnered a total of $42$ world-wide registrations
and received a total of $19$ combined submissions for Track-1 and Track-2. This
paper describes the challenge, details of the datasets, tasks, and the baseline
system. Additionally, the paper provides a concise overview of the submitted
systems in both tracks, with an emphasis given to the top performing systems.
The paper also presents insights and future perspectives for SD and LD tasks,
focusing on the key challenges that the systems need to overcome before
wide-spread commercial deployment on such conversations.
Related papers
- TCG CREST System Description for the Second DISPLACE Challenge [19.387615374726444]
We describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024.
Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios.
arXiv Detail & Related papers (2024-09-16T05:13:34Z) - A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge [16.813582262700415]
The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities.
The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers.
arXiv Detail & Related papers (2024-06-22T10:49:36Z) - The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments [28.460119283649913]
The dataset contains 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings.
12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages.
We have compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge.
arXiv Detail & Related papers (2024-06-13T17:32:32Z) - Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan [29.23176868272216]
The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.
This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge.
arXiv Detail & Related papers (2024-04-14T19:51:32Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation
over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.
The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.
The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining
for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian.
We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.
Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.