IndieFake Dataset: A Benchmark Dataset for Audio Deepfake Detection
- URL: http://arxiv.org/abs/2506.19014v2
- Date: Thu, 26 Jun 2025 17:21:45 GMT
- Title: IndieFake Dataset: A Benchmark Dataset for Audio Deepfake Detection
- Authors: Abhay Kumar, Kunal Verma, Omkar More,
- Abstract summary: Deepfake technology offers benefits like AI assistants, better accessibility for speech impairments, and enhanced entertainment.<n>It also poses significant risks to security, privacy, and trust in digital communications.<n>Existing datasets lack diverse ethnic accents, making them inadequate for many real-world scenarios.<n>This work introduces the IndieFake dataset (IFD), featuring 27.17 hours of bonafide and deepfake audio from 50 English speaking Indian speakers.
- Score: 0.4451479907610763
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancements in audio deepfake technology offers benefits like AI assistants, better accessibility for speech impairments, and enhanced entertainment. However, it also poses significant risks to security, privacy, and trust in digital communications. Detecting and mitigating these threats requires comprehensive datasets. Existing datasets lack diverse ethnic accents, making them inadequate for many real-world scenarios. Consequently, models trained on these datasets struggle to detect audio deepfakes in diverse linguistic and cultural contexts such as in South-Asian countries. Ironically, there is a stark lack of South-Asian speaker samples in the existing datasets despite constituting a quarter of the worlds population. This work introduces the IndieFake Dataset (IFD), featuring 27.17 hours of bonafide and deepfake audio from 50 English speaking Indian speakers. IFD offers balanced data distribution and includes speaker-level characterization, absent in datasets like ASVspoof21 (DF). We evaluated various baselines on IFD against existing ASVspoof21 (DF) and In-The-Wild (ITW) datasets. IFD outperforms ASVspoof21 (DF) and proves to be more challenging compared to benchmark ITW dataset. The complete dataset, along with documentation and sample reference clips, is publicly accessible for research use on project website.
Related papers
- MAVOS-DD: Multilingual Audio-Video Open-Set Deepfake Detection Benchmark [108.46287432944392]
We present the first large-scale open-set benchmark for multilingual audio-video deepfake detection.<n>Our dataset comprises over 250 hours of real and fake videos across eight languages.<n>For each language, the fake videos are generated with seven distinct deepfake generation models.
arXiv Detail & Related papers (2025-05-16T10:42:30Z) - Bridging the Data Provenance Gap Across Text, Speech and Video [67.72097952282262]
We conduct the largest and first-of-its-kind longitudinal audit across modalities of popular text, speech, and video datasets.<n>Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries.<n>We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets.
arXiv Detail & Related papers (2024-12-19T01:30:19Z) - Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset [11.164272928464879]
Fake videos or speeches in Hindi can have an enormous impact on rural and semi-urban communities.
This paper aims to create a first novel Hindi deep fake dataset, named Hindi audio-video-Deepfake'' (HAV-DF)
arXiv Detail & Related papers (2024-11-23T05:18:43Z) - SpoofCeleb: Speech Deepfake Detection and SASV In The Wild [76.71096751337888]
SpoofCeleb is a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV)<n>SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions.
arXiv Detail & Related papers (2024-09-18T23:17:02Z) - Deepfake audio as a data augmentation technique for training automatic
speech to text transcription models [55.2480439325792]
We propose a framework that approaches data augmentation based on deepfake audio.
A dataset produced by Indians (in English) was selected, ensuring the presence of a single accent.
arXiv Detail & Related papers (2023-09-22T11:33:03Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - Half-Truth: A Partially Fake Audio Detection Dataset [60.08010668752466]
This paper develops a dataset for half-truth audio detection (HAD)
Partially fake audio in the HAD dataset involves only changing a few words in an utterance.
We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset.
arXiv Detail & Related papers (2021-04-08T08:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.