The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments
- URL: http://arxiv.org/abs/2406.09494v1
- Date: Thu, 13 Jun 2024 17:32:32 GMT
- Title: The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments
- Authors: Shareef Babu Kalluri, Prachi Singh, Pratik Roy Chowdhuri, Apoorva Kulkarni, Shikha Baghel, Pradyoth Hegde, Swapnil Sontakke, Deepak K T, S. R. Mahadeva Prasanna, Deepu Vijayasenan, Sriram Ganapathy,
- Abstract summary: The dataset contains 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings.
12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages.
We have compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge.
- Score: 28.460119283649913
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this dataset. The dataset containing 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings, was released for LD and SD tracks. Further, 12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages. The details of the dataset, baseline systems and the leader board results are highlighted in this paper. We have also compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge.
Related papers
- The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge: Tasks, Results and Findings [18.994388357437924]
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge aims to benchmark and advance zero-shot spontaneous style voice cloning.
This paper details the data, tracks, submitted systems, evaluation results, and findings.
arXiv Detail & Related papers (2024-10-31T09:39:49Z) - TCG CREST System Description for the Second DISPLACE Challenge [19.387615374726444]
We describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024.
Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios.
arXiv Detail & Related papers (2024-09-16T05:13:34Z) - Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks [62.443665295250035]
We present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023)
In total, 32 competing teams register for the challenge, from which we received 11 successful submissions.
arXiv Detail & Related papers (2024-07-20T10:13:54Z) - System Description for the Displace Speaker Diarization Challenge 2023 [0.0]
This paper describes our solution for the Diarization of Speaker and Language in Conversational Environments Challenge (Displace 2023)
We used a combination of VAD for finding segfments with speech, Resnet architecture based CNN for feature extraction from these segments, and spectral clustering for features clustering.
arXiv Detail & Related papers (2024-06-20T21:40:02Z) - Perception Test 2023: A Summary of the First Challenge And Outcome [67.0525378209708]
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023.
The goal was to benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark.
We summarise in this report the task descriptions, metrics, baselines, and results.
arXiv Detail & Related papers (2023-12-20T15:12:27Z) - Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and
LAnguage in Conversational Environments [28.618333018398122]
In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages.
Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers.
The DISPLACE challenge constitutes an open-call for evaluating and bench-marking the speaker and language diarization technologies on this challenging condition.
arXiv Detail & Related papers (2023-11-21T12:23:58Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation
over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.
The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.
The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms.
The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge [95.6159736804855]
The VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) was held in conjunction with INTERSPEECH 2022.
The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild"
arXiv Detail & Related papers (2023-02-20T19:27:14Z) - "This is Houston. Say again, please". The Behavox system for the
Apollo-11 Fearless Steps Challenge (phase II) [3.3263205689999453]
We describe the speech activity detection (SAD), speaker diarization (SD), and automatic speech recognition (ASR) experiments conducted by the Behavox team for the Interspeech 2020 Fearless Steps Challenge (FSC-2)
A relatively small amount of labeled data, a large variety of speakers and channel distortions, specific lexicon and speaking style resulted in high error rates on the systems which involved this data.
For all systems, we report substantial performance improvements compared to the FSC-2 baseline systems.
arXiv Detail & Related papers (2020-08-04T13:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.