Related papers: SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

URL: http://arxiv.org/abs/2503.16578v1
Date: Thu, 20 Mar 2025 11:31:47 GMT
Title: SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Authors: Yang Chen, Hui Wang, Shiyao Wang, Junyang Chen, Jiabei He, Jiaming Zhou, Xi Yang, Yequan Wang, Yonghua Lin, Yong Qin,
Abstract summary: SeniorTalk is a carefully annotated Chinese spoken dialogue dataset.<n>This dataset contains 55.53 hours of speech from 101 natural conversations involving 202 participants.<n>We perform experiments on speaker verification, speaker diarization, speech recognition, and speech editing tasks.
Score: 23.837811649327094
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While voice technologies increasingly serve aging populations, current systems exhibit significant performance gaps due to inadequate training data capturing elderly-specific vocal characteristics like presbyphonia and dialectal variations. The limited data available on super-aged individuals in existing elderly speech datasets, coupled with overly simple recording styles and annotation dimensions, exacerbates this issue. To address the critical scarcity of speech data from individuals aged 75 and above, we introduce SeniorTalk, a carefully annotated Chinese spoken dialogue dataset. This dataset contains 55.53 hours of speech from 101 natural conversations involving 202 participants, ensuring a strategic balance across gender, region, and age. Through detailed annotation across multiple dimensions, it can support a wide range of speech tasks. We perform extensive experiments on speaker verification, speaker diarization, speech recognition, and speech editing tasks, offering crucial insights for the development of speech technologies targeting this age group.

Related papers

SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents [52.29009595100625]
Role-playing agents have emerged as a promising paradigm for achieving personalized interaction and emotional resonance.<n>Existing research primarily focuses on the textual modality, neglecting the critical dimension of speech in realistic interactive scenarios.<n>We construct SpeechRole-Data, a large-scale, high-quality dataset that comprises 98 diverse roles and 112k speech-based single-turn and multi-turn conversations.
arXiv Detail & Related papers (2025-08-04T03:18:36Z)
Speech Unlearning [14.755831733659699]
We introduce machine unlearning for speech tasks, a novel and underexplored research problem.<n>It aims to efficiently and effectively remove the influence of specific data from trained speech models without full retraining.<n>It has important applications in privacy preservation, removal of outdated or noisy data, and bias mitigation.
arXiv Detail & Related papers (2025-06-01T06:04:16Z)
VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin [14.375859578488456]
We present a large-scale longitudinal dataset collected from 293 speakers over several years, with the longest time span reaching 17 years (approximately 900 weeks)<n>We studied the phenomenon of speaker aging and its effects on advanced speaker verification systems, analyzed individual speaker aging processes, and explored the impact of factors such as age group and gender on speaker aging research.
arXiv Detail & Related papers (2025-05-27T17:16:59Z)
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description [19.064845530513285]
We propose an automatic speech annotation system for interpretation that annotates in-the-wild speech clips with expressive and vivid human language descriptions. Our system provides in-depth understandings of speech style through tailored natural language descriptions. It is distinguished by highly descriptive natural language style prompts, containing approximately 2,000 hours of audio data and encompassing over two million speech clips.
arXiv Detail & Related papers (2024-08-24T15:36:08Z)
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model [47.67067056593085]
We develop a pipeline capable of transforming single-channel dialogue data into pseudo-stereo data. This expanded our training dataset from a mere 2,000 to an impressive 17,600 hours. The inclusion of this pseudo-stereo data has proven to be effective in improving the performance of spoken dialogue language models.
arXiv Detail & Related papers (2024-07-02T03:22:41Z)
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation [83.29199726650899]
The EARS dataset comprises 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics.
arXiv Detail & Related papers (2024-06-10T11:28:29Z)
Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation [53.01238689626378]
We propose a novel approach to leverage semantic information in speaker diarization systems. We introduce spoken language understanding modules to extract speaker-related semantic information. We present a novel framework to integrate these constraints into the speaker diarization pipeline.
arXiv Detail & Related papers (2023-09-19T09:13:30Z)
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD. Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z)
Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification [0.0]
We consider the use of speaker-discriminative embeddings derived from adversarial multi-task learning to align features and reduce the domain discrepancy in age subgroups. Experimental results on the VoxCeleb Enrichment dataset verify the effectiveness of our proposed adaptive adversarial network in multi-objective scenarios.
arXiv Detail & Related papers (2023-01-22T05:01:13Z)
Sentiment recognition of Italian elderly through domain adaptation on cross-corpus speech dataset [77.99182201815763]
The aim of this work is to define a speech emotion recognition (SER) model able to recognize positive, neutral and negative emotions in natural conversations of Italian elderly people.
arXiv Detail & Related papers (2022-11-14T12:39:41Z)
Data-augmented cross-lingual synthesis in a teacher-student framework [3.2548794659022398]
Cross-lingual synthesis is the task of letting a speaker generate fluent synthetic speech in another language. Previous research shows that many models appear to have insufficient generalization capabilities. We propose to apply the teacher-student paradigm to cross-lingual synthesis.
arXiv Detail & Related papers (2022-03-31T20:01:32Z)
A Review of Speaker Diarization: Recent Advances with Deep Learning [78.20151731627958]
Speaker diarization is a task to label audio or video recordings with classes corresponding to speaker identity. With the rise of deep learning technology, more rapid advancements have been made for speaker diarization. We discuss how speaker diarization systems have been integrated with speech recognition applications.
arXiv Detail & Related papers (2021-01-24T01:28:05Z)
Speaker De-identification System using Autoencoders and Adversarial Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders. Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.