The DKU-DUKEECE System for the Manipulation Region Location Task of ADD
2023
- URL: http://arxiv.org/abs/2308.10281v1
- Date: Sun, 20 Aug 2023 14:29:04 GMT
- Title: The DKU-DUKEECE System for the Manipulation Region Location Task of ADD
2023
- Authors: Zexin Cai, Weiqing Wang, Yikang Wang, Ming Li
- Abstract summary: This paper introduces our system designed for Track 2 of the Audio Deepfake Detection Challenge (ADD 2023)
Our top-performing solution achieves an impressive 82.23% sentence accuracy and an F1 score of 60.66%.
This results in a final ADD score of 0.6713, securing the first rank in Track 2 of ADD 2023.
- Score: 12.69800199589029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces our system designed for Track 2, which focuses on
locating manipulated regions, in the second Audio Deepfake Detection Challenge
(ADD 2023). Our approach involves the utilization of multiple detection systems
to identify splicing regions and determine their authenticity. Specifically, we
train and integrate two frame-level systems: one for boundary detection and the
other for deepfake detection. Additionally, we employ a third VAE model trained
exclusively on genuine data to determine the authenticity of a given audio
clip. Through the fusion of these three systems, our top-performing solution
for the ADD challenge achieves an impressive 82.23% sentence accuracy and an F1
score of 60.66%. This results in a final ADD score of 0.6713, securing the
first rank in Track 2 of ADD 2023.
Related papers
- Uncertainty Estimation for 3D Object Detection via Evidential Learning [63.61283174146648]
We introduce a framework for quantifying uncertainty in 3D object detection by leveraging an evidential learning loss on Bird's Eye View representations in the 3D detector.
We demonstrate both the efficacy and importance of these uncertainty estimates on identifying out-of-distribution scenes, poorly localized objects, and missing (false negative) detections.
arXiv Detail & Related papers (2024-10-31T13:13:32Z) - Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024 [8.940008511570207]
This work details our approach to achieving a leading system with a 1.79% pooled equal error rate (EER)
The rapid advancement of generative AI models presents significant challenges for detecting AI-generated deepfake singing voices.
The Singing Voice Deepfake Detection (SVDD) Challenge 2024 aims to address this complex task.
arXiv Detail & Related papers (2024-09-03T21:28:45Z) - The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report [180.94772271910315]
This paper reviews the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions.
The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs.
The challenge had 262 registered participants, and 34 teams made valid submissions.
arXiv Detail & Related papers (2024-04-16T07:26:20Z) - Bridging the Gap Between End-to-End and Two-Step Text Spotting [88.14552991115207]
Bridging Text Spotting is a novel approach that resolves the error accumulation and suboptimal performance issues in two-step methods.
We demonstrate the effectiveness of the proposed method through extensive experiments.
arXiv Detail & Related papers (2024-04-06T13:14:04Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Sparse4D v3: Advancing End-to-End 3D Detection and Tracking [12.780544029261353]
We introduce two auxiliary training tasks and propose decoupled attention to make structural improvements.
We extend the detector into a tracker using a straightforward approach that assigns instance ID during inference.
Our best model achieved 71.9% NDS and 67.7% AMOTA on the nuScenes test set.
arXiv Detail & Related papers (2023-11-20T12:37:58Z) - TranssionADD: A multi-frame reinforcement based sequence tagging model
for audio deepfake detection [11.27584658526063]
The second Audio Deepfake Detection Challenge (ADD 2023) aims to detect and analyze deepfake speech utterances.
We propose our novel TranssionADD system as a solution to the challenging problem of model robustness and audio segment outliers.
Our best submission achieved 2nd place in Track 2, demonstrating the effectiveness and robustness of our proposed system.
arXiv Detail & Related papers (2023-06-27T05:18:25Z) - SSDA3D: Semi-supervised Domain Adaptation for 3D Object Detection from
Point Cloud [125.9472454212909]
We present a novel Semi-Supervised Domain Adaptation method for 3D object detection (SSDA3D)
SSDA3D includes an Inter-domain Adaptation stage and an Intra-domain Generalization stage.
Experiments show that, with only 10% labeled target data, our SSDA3D can surpass the fully-supervised oracle model with 100% target label.
arXiv Detail & Related papers (2022-12-06T09:32:44Z) - The Royalflush System for VoxCeleb Speaker Recognition Challenge 2022 [4.022057598291766]
We describe the Royalflush submissions for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22)
For track 1, we develop a powerful U-Net-based speaker embedding extractor with a symmetric architecture.
For track 3, we employ the joint training of source domain supervision and target domain self-supervision to get a speaker embedding extractor.
arXiv Detail & Related papers (2022-09-19T13:35:36Z) - USTC-NELSLIP System Description for DIHARD-III Challenge [78.40959509760488]
The innovation of our system lies in the combination of various front-end techniques to solve the diarization problem.
Our best system achieved DERs of 11.30% in track 1 and 16.78% in track 2 on evaluation set.
arXiv Detail & Related papers (2021-03-19T07:00:51Z) - DIHARD II is Still Hard: Experimental Results and Discussions from the
DKU-LENOVO Team [22.657782236219933]
We present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULE team.
Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection.
Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.
arXiv Detail & Related papers (2020-02-23T11:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.