The First Voice Timbre Attribute Detection Challenge
- URL: http://arxiv.org/abs/2509.06635v1
- Date: Mon, 08 Sep 2025 12:54:28 GMT
- Title: The First Voice Timbre Attribute Detection Challenge
- Authors: Liping Chen, Jinghao He, Zhengyan Sheng, Kong Aik Lee, Zhen-Hua Ling,
- Abstract summary: The first voice attribute detection challenge is featured in a special session at NCMMSC 2025.<n>It focuses on the explainability of voice timbre and compares the intensity of two speech utterances in a specified timbre dimension descriptor.<n>The evaluation was conducted on the VCTK-RVA dataset.
- Score: 65.1653769568636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The first voice timbre attribute detection challenge is featured in a special session at NCMMSC 2025. It focuses on the explainability of voice timbre and compares the intensity of two speech utterances in a specified timbre descriptor dimension. The evaluation was conducted on the VCTK-RVA dataset. Participants developed their systems and submitted their outputs to the organizer, who evaluated the performance and sent feedback to them. Six teams submitted their outputs, with five providing descriptions of their methodologies.
Related papers
- The Third VoicePrivacy Challenge: Preserving Emotional Expressiveness and Linguistic Content in Voice Anonymization [37.8365190579156]
We present results and analyses from the third VoicePrivacy Challenge held in 2024.<n>The task was to develop a voice anonymization system for speech data that conceals a speaker's voice identity while preserving linguistic content and emotional state.
arXiv Detail & Related papers (2026-01-17T00:33:16Z) - Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM [53.17360668423001]
Overlapping Speech Detection (OSD) aims to identify regions where multiple speakers overlap in a conversation.<n>This work proposes a speaker-aware progressive OSD model that leverages a progressive training strategy to enhance the correlation between subtasks.<n> Experimental results show that the proposed method achieves state-of-the-art performance, with an F1 score of 82.76% on the AMI test set.
arXiv Detail & Related papers (2025-05-29T07:47:48Z) - P2VA: Converting Persona Descriptions into Voice Attributes for Fair and Controllable Text-to-Speech [12.143236645802787]
We introduce Persona-to-Voice-Attribute (P2VA), the first framework enabling voice generation automatically from persona descriptions.<n>Our approach employs two strategies: P2VA-C for structured voice attributes, and P2VA-O for richer style descriptions.
arXiv Detail & Related papers (2025-05-21T01:28:56Z) - Introducing voice timbre attribute detection [40.14712328633083]
This paper focuses on explaining the timbre conveyed by speech signals and introduces a task termed voice timbre attribute detection (vTAD)<n>A pair of speech utterances is processed, and their intensity is compared in a designated timbre descriptor.<n>A framework is proposed, which is built upon the speaker embeddings extracted from the speech utterances.
arXiv Detail & Related papers (2025-05-14T13:46:46Z) - Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model [8.745106905496284]
This paper presents a novel approach for the automatic generation of Cued Speech (ACSG)<n>We explore transfer learning strategies by leveraging a pre-trained audiovisual autoregressive text-to-speech model (AVTacotron2)<n>With a decoding accuracy at the phonetic level reaching approximately 77%, the results demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-01-08T19:26:43Z) - VoiceBench: Benchmarking LLM-Based Voice Assistants [58.84144494938931]
We introduce VoiceBench, the first benchmark to evaluate voice assistants based on large language models (LLMs)<n>VoiceBench includes both real and synthetic spoken instructions that incorporate the above three key real-world variations.<n>Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.
arXiv Detail & Related papers (2024-10-22T17:15:20Z) - Advancing Natural-Language Based Audio Retrieval with PaSST and Large
Audio-Caption Data Sets [6.617487928813374]
We present a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers.
Our system ranked first in the 2023's DCASE Challenge, and it outperforms the current state of the art on the ClothoV2 benchmark by 5.6 pp. mAP@10.
arXiv Detail & Related papers (2023-08-08T13:46:55Z) - High-Quality Automatic Voice Over with Accurate Alignment: Supervision
through Self-Supervised Discrete Speech Units [69.06657692891447]
We propose a novel AVO method leveraging the learning objective of self-supervised discrete speech unit prediction.
Experimental results show that our proposed method achieves remarkable lip-speech synchronization and high speech quality.
arXiv Detail & Related papers (2023-06-29T15:02:22Z) - The VoicePrivacy 2022 Challenge Evaluation Plan [46.807999940446294]
Training, development and evaluation datasets are provided.
Participants apply their developed anonymization systems.
Results will be presented at a workshop held in conjunction with INTERSPEECH 2022.
arXiv Detail & Related papers (2022-03-23T15:05:18Z) - CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for
Unsegmented Recordings [87.37967358673252]
We organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6)
The challenge revisits the previous CHiME-5 challenge and further considers the problem of distant multi-microphone conversational speech diarization and recognition.
This paper provides a baseline description of the CHiME-6 challenge for both segmented multispeaker speech recognition and unsegmented multispeaker speech recognition.
arXiv Detail & Related papers (2020-04-20T12:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.