VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
- URL: http://arxiv.org/abs/2302.10248v1
- Date: Mon, 20 Feb 2023 19:27:14 GMT
- Title: VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
- Authors: Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha
Nagrani, Daniel Garcia-Romero, Andrew Zisserman
- Abstract summary: The VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) was held in conjunction with INTERSPEECH 2022.
The goal of this challenge was to evaluate how well state-of-the-art speaker recognition systems can diarise and recognise speakers from speech obtained "in the wild"
- Score: 95.6159736804855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper summarises the findings from the VoxCeleb Speaker Recognition
Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH
2022. The goal of this challenge was to evaluate how well state-of-the-art
speaker recognition systems can diarise and recognise speakers from speech
obtained "in the wild". The challenge consisted of: (i) the provision of
publicly available speaker recognition and diarisation data from YouTube videos
together with ground truth annotation and standardised evaluation software; and
(ii) a public challenge and hybrid workshop held at INTERSPEECH 2022. We
describe the four tracks of our challenge along with the baselines, methods,
and results. We conclude with a discussion on the new domain-transfer focus of
VoxSRC-22, and on the progression of the challenge from the previous three
editions.
Related papers
- CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge [12.178918299455898]
The challenge yielded highly successful results, with the best submission significantly outperforming the baseline.
This paper comprehensively reviews the challenge, encompassing the data profile, task specifications, and baseline system construction.
arXiv Detail & Related papers (2024-06-14T12:49:38Z) - The Second DISPLACE Challenge : DIarization of SPeaker and LAnguage in Conversational Environments [28.460119283649913]
The dataset contains 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings.
12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages.
We have compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge.
arXiv Detail & Related papers (2024-06-13T17:32:32Z) - ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech
Recognition Challenge [94.13624830833314]
This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle.
First-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track.
arXiv Detail & Related papers (2024-01-07T12:51:42Z) - Perception Test 2023: A Summary of the First Challenge And Outcome [67.0525378209708]
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023.
The goal was to benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark.
We summarise in this report the task descriptions, metrics, baselines, and results.
arXiv Detail & Related papers (2023-12-20T15:12:27Z) - Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge [0.0]
This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022)
The goal of the challenge is to advance the ASR research for the Portuguese language, considering prepared and spontaneous speech in different dialects.
arXiv Detail & Related papers (2022-07-29T00:48:40Z) - VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge [99.82500204110015]
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020.
The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or in the wild' data.
This paper outlines the challenge, and describes the baselines, methods used, and results.
arXiv Detail & Related papers (2020-12-12T17:20:57Z) - The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020) [186.7816349401443]
We present a new video understanding pentathlon challenge, an open competition held in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020.
The objective of the challenge was to explore and evaluate new methods for text-to-video retrieval.
arXiv Detail & Related papers (2020-08-03T09:55:26Z) - CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for
Unsegmented Recordings [87.37967358673252]
We organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6)
The challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition.
This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition and unsegmented multispeaker speech recognition.
arXiv Detail & Related papers (2020-04-20T12:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.