Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
- URL: http://arxiv.org/abs/2501.17304v1
- Date: Tue, 28 Jan 2025 21:25:08 GMT
- Title: Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
- Authors: Igor Abramovski, Alon Vinnikov, Shalev Shaer, Naoyuki Kanda, Xiaofei Wang, Amir Ivry, Eyal Krupka,
- Abstract summary: The first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1) Challenge is a pivotal initiative that sets new benchmarks.
The challenge provides a unique combination of 280 recorded meetings across 30 diverse environments, capturing real-world acoustic conditions and conversational dynamics.
We provide an overview of the systems submitted to the challenge and analyze the top-performing approaches.
- Score: 14.045317709780313
- License:
- Abstract: The first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1) Challenge is a pivotal initiative that sets new benchmarks by offering datasets more representative of the needs of real-world business applications than those previously available. The challenge provides a unique combination of 280 recorded meetings across 30 diverse environments, capturing real-world acoustic conditions and conversational dynamics, and a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. In this paper, we provide an overview of the systems submitted to the challenge and analyze the top-performing approaches, hypothesizing the factors behind their success. Additionally, we highlight promising directions left unexplored by participants. By presenting key findings and actionable insights, this work aims to drive further innovation and progress in DASR research and applications.
Related papers
- Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis [3.210706100833053]
We propose and implement a fully integrated system that replaces conventional AFE models with Open AI's Whisper.
We show that Whisper not only accelerates processing but also improves specific aspects of rendering quality, resulting in more realistic and responsive talking-head interactions.
arXiv Detail & Related papers (2024-11-20T11:18:05Z) - Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks [112.7791602217381]
We present Dynamic-SUPERB Phase-2, an open benchmark for the comprehensive evaluation of instruction-based universal speech models.
Building upon the first generation, this second version incorporates 125 new tasks, expanding the benchmark to a total of 180 tasks.
Evaluation results indicate that none of the models performed well universally.
arXiv Detail & Related papers (2024-11-08T06:33:22Z) - INQUIRE: A Natural World Text-to-Image Retrieval Benchmark [51.823709631153946]
We introduce INQUIRE, a text-to-image retrieval benchmark designed to challenge multimodal vision-language models on expert-level queries.
InQUIRE includes iNaturalist 2024 (iNat24), a new dataset of five million natural world images, along with 250 expert-level retrieval queries.
Our benchmark evaluates two core retrieval tasks: (1) INQUIRE-Fullrank, a full dataset ranking task, and (2) INQUIRE-Rerank, a reranking task for refining top-100 retrievals.
arXiv Detail & Related papers (2024-11-04T19:16:53Z) - DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition [51.96660522869841]
DailyDVS-200 is a benchmark dataset tailored for the event-based action recognition community.
It covers 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences.
DailyDVS-200 is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions.
arXiv Detail & Related papers (2024-07-06T15:25:10Z) - NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant
Meeting Transcription [21.236634241186458]
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1'') Challenge alongside datasets and baseline system.
The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios.
arXiv Detail & Related papers (2024-01-16T23:50:26Z) - Perception Test 2023: A Summary of the First Challenge And Outcome [67.0525378209708]
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023.
The goal was to benchmarking state-of-the-art video models on the recently proposed Perception Test benchmark.
We summarise in this report the task descriptions, metrics, baselines, and results.
arXiv Detail & Related papers (2023-12-20T15:12:27Z) - FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of
Synthetic Data [82.5767720132393]
This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024.
This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology.
arXiv Detail & Related papers (2023-11-17T12:15:40Z) - DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic
Echo Cancellation, Noise Suppression and Dereverberation [12.734839065028547]
This paper proposes a real-time cross-attention deep model named DeepVQE, based on residual convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We conduct ablation studies analyze the contributions of different components of our model to achieve the overall performance.
DeepVQE state-of-the-art performance on nonpersonalized tracks from the ICASSP 2023 Acoustic Echo Challenge and ICASSP 2023 Deep Noise Suppression Challenge test sets, showing that a single model can handle multiple tasks with excellent performance.
arXiv Detail & Related papers (2023-06-05T18:37:05Z) - Robust, General, and Low Complexity Acoustic Scene Classification
Systems and An Effective Visualization for Presenting a Sound Scene Context [53.80051967863102]
We present a comprehensive analysis of Acoustic Scene Classification (ASC)
We propose an inception-based and low footprint ASC model, referred to as the ASC baseline.
Next, we improve the ASC baseline by proposing a novel deep neural network architecture.
arXiv Detail & Related papers (2022-10-16T19:07:21Z) - A Proposal for Foley Sound Synthesis Challenge [7.469200949273274]
"Foley" refers to sound effects that are added to multimedia during post-production to enhance its perceived acoustic properties.
We propose a challenge for automatic foley synthesis.
arXiv Detail & Related papers (2022-07-21T21:19:07Z) - RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble [19.041809003928506]
We propose a weighted hierarchical rank fusion approach to meet the challenge of building a search engine for rapidly evolving biomedical collection.
Our ablation studies demonstrate the contributions of each of these systems to the overall ensemble.
The submitted ensemble runs achieved state-of-the-art performance in rounds 4 and 5 of the TREC-COVID challenge.
arXiv Detail & Related papers (2020-10-01T05:27:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.