Multi-source Domain Adaptation for Text-independent Forensic Speaker
Recognition
- URL: http://arxiv.org/abs/2211.09913v1
- Date: Thu, 17 Nov 2022 22:11:25 GMT
- Title: Multi-source Domain Adaptation for Text-independent Forensic Speaker
Recognition
- Authors: Zhenyu Wang, and John H. L. Hansen
- Abstract summary: Adapting speaker recognition systems to new environments is a widely-used technique to improve a well-performing model.
Previous studies focus on single domain adaptation, which neglects a more practical scenario where training data are collected from multiple acoustic domains.
Three novel adaptation methods are proposed to further promote adaptation performance across multiple acoustic domains.
- Score: 36.83842373791537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adapting speaker recognition systems to new environments is a widely-used
technique to improve a well-performing model learned from large-scale data
towards a task-specific small-scale data scenarios. However, previous studies
focus on single domain adaptation, which neglects a more practical scenario
where training data are collected from multiple acoustic domains needed in
forensic scenarios. Audio analysis for forensic speaker recognition offers
unique challenges in model training with multi-domain training data due to
location/scenario uncertainty and diversity mismatch between reference and
naturalistic field recordings. It is also difficult to directly employ
small-scale domain-specific data to train complex neural network architectures
due to domain mismatch and performance loss. Fine-tuning is a commonly-used
method for adaptation in order to retrain the model with weights initialized
from a well-trained model. Alternatively, in this study, three novel adaptation
methods based on domain adversarial training, discrepancy minimization, and
moment-matching approaches are proposed to further promote adaptation
performance across multiple acoustic domains. A comprehensive set of
experiments are conducted to demonstrate that: 1) diverse acoustic environments
do impact speaker recognition performance, which could advance research in
audio forensics, 2) domain adversarial training learns the discriminative
features which are also invariant to shifts between domains, 3)
discrepancy-minimizing adaptation achieves effective performance simultaneously
across multiple acoustic domains, and 4) moment-matching adaptation along with
dynamic distribution alignment also significantly promotes speaker recognition
performance on each domain, especially for the LENA-field domain with noise
compared to all other systems.
Related papers
- Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation [45.76004686788507]
We present a novel data simulation pipeline that produces diverse training data from a range of acoustic environments and content.
We propose new training paradigms to improve quality of a general speech separation model.
arXiv Detail & Related papers (2024-08-28T20:26:34Z) - Cross-domain Voice Activity Detection with Self-Supervised
Representations [9.02236667251654]
Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal.
Current state-of-the-art methods focus on training a neural network exploiting features directly contained in the acoustics.
We show that representations based on Self-Supervised Learning (SSL) can adapt well to different domains.
arXiv Detail & Related papers (2022-09-22T14:53:44Z) - AdaStereo: An Efficient Domain-Adaptive Stereo Matching Approach [50.855679274530615]
We present a novel domain-adaptive approach called AdaStereo to align multi-level representations for deep stereo matching networks.
Our models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo.
Our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.
arXiv Detail & Related papers (2021-12-09T15:10:47Z) - DEAAN: Disentangled Embedding and Adversarial Adaptation Network for
Robust Speaker Representation Learning [69.70594547377283]
We propose a novel framework to disentangle speaker-related and domain-specific features.
Our framework can effectively generate more speaker-discriminative and domain-invariant speaker representations.
arXiv Detail & Related papers (2020-12-12T19:46:56Z) - Ensemble of Discriminators for Domain Adaptation in Multiple Sound
Source 2D Localization [7.564344795030588]
This paper introduces an ensemble of discriminators that improves the accuracy of a domain adaptation technique for the localization of multiple sound sources.
Recording and labeling such datasets is very costly, especially because data needs to be diverse enough to cover different acoustic conditions.
arXiv Detail & Related papers (2020-12-10T09:17:29Z) - Cross-domain Adaptation with Discrepancy Minimization for
Text-independent Forensic Speaker Verification [61.54074498090374]
This study introduces a CRSS-Forensics audio dataset collected in multiple acoustic environments.
We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics.
arXiv Detail & Related papers (2020-09-05T02:54:33Z) - Adaptive Risk Minimization: Learning to Adapt to Domain Shift [109.87561509436016]
A fundamental assumption of most machine learning algorithms is that the training and test data are drawn from the same underlying distribution.
In this work, we consider the problem setting of domain generalization, where the training data are structured into domains and there may be multiple test time shifts.
We introduce the framework of adaptive risk minimization (ARM), in which models are directly optimized for effective adaptation to shift by learning to adapt on the training domains.
arXiv Detail & Related papers (2020-07-06T17:59:30Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Toward Cross-Domain Speech Recognition with End-to-End Models [18.637636841477]
In this paper, we empirically examine the difference in behavior between hybrid acoustic models and neural end-to-end systems.
We show that for the hybrid models, supplying additional training data from other domains with mismatched acoustic conditions does not increase the performance on specific domains.
Our end-to-end models optimized with sequence-based criterion generalize better than the hybrid models on diverse domains.
arXiv Detail & Related papers (2020-03-09T15:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.