An Effective Energy Mask-based Adversarial Evasion Attacks against Misclassification in Speaker Recognition Systems
- URL: http://arxiv.org/abs/2601.22390v1
- Date: Thu, 29 Jan 2026 22:58:20 GMT
- Title: An Effective Energy Mask-based Adversarial Evasion Attacks against Misclassification in Speaker Recognition Systems
- Authors: Chanwoo Park, Chanwoo Kim,
- Abstract summary: Adrial attack methods have emerged as the most effective countermeasure against the indiscriminate use of voice data.<n>This research introduces masked energy perturbation (MEP), a novel approach using power spectrum for energy masking of original voice data.<n>The proposed MEP method demonstrated strong performance in both audio quality and evasion effectiveness.
- Score: 15.9691465248047
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Evasion attacks pose significant threats to AI systems, exploiting vulnerabilities in machine learning models to bypass detection mechanisms. The widespread use of voice data, including deepfakes, in promising future industries is currently hindered by insufficient legal frameworks. Adversarial attack methods have emerged as the most effective countermeasure against the indiscriminate use of such data. This research introduces masked energy perturbation (MEP), a novel approach using power spectrum for energy masking of original voice data. MEP applies masking to small energy regions in the frequency domain before generating adversarial perturbations, targeting areas less noticeable to the human auditory model. The study primarily employs advanced speaker recognition models, including ECAPA-TDNN and ResNet34, which have shown remarkable performance in speaker verification tasks. The proposed MEP method demonstrated strong performance in both audio quality and evasion effectiveness. The energy masking approach effectively minimizes the perceptual evaluation of speech quality (PESQ) degradation, indicating that minimal perceptual distortion occurs to the human listener despite the adversarial perturbations. Specifically, in the PESQ evaluation, the relative performance of the MEP method was 26.68% when compared to the fast gradient sign method (FGSM) and iterative FGSM.
Related papers
- SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning [52.29460857893198]
Existing fraud detection methods rely on transcribed text, suffering from ASR errors and missing crucial acoustic cues like vocal tone and environmental context.<n>We propose SAFE-QAQ, an end-to-end comprehensive framework for audio-based slow-thinking fraud detection.<n>Our framework introduces a dynamic risk assessment framework during live calls, enabling early detection and prevention of fraud.
arXiv Detail & Related papers (2026-01-04T06:09:07Z) - Dual Attention Guided Defense Against Malicious Edits [70.17363183107604]
We propose a Dual Attention-Guided Noise Perturbation (DANP) immunization method that adds imperceptible perturbations to disrupt the model's semantic understanding and generation process.<n>Our method exhibits impressive immunity against malicious edits, and extensive experiments confirm that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-16T12:01:28Z) - ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs [61.09812971042288]
Evolutionary Noise Jailbreak (ENJ)<n>This paper proposes a genetic algorithm to transform environmental noise from a passive interference into an actively optimizable attack carrier for jailbreaking LSMs.<n>Experiments on multiple mainstream speech models show that ENJ's attack effectiveness is significantly superior to existing baseline methods.
arXiv Detail & Related papers (2025-09-14T06:39:38Z) - An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models [34.59032968400701]
Anomalous Sound Detection (ASD) aims at identifying anomalous sounds from machines.<n>Uncertainty of anomaly location and much redundant information such as noise in machine sounds hinder the improvement of ASD system performance.<n>This paper proposes a novel audio feature of filter banks with evenly distributed intervals, ensuring equal attention to all frequency ranges in the audio.
arXiv Detail & Related papers (2025-08-21T08:04:08Z) - A Small-footprint Acoustic Echo Cancellation Solution for Mobile Full-Duplex Speech Interactions [1.5929852667227002]
This paper presents a neural network-based solution to address challenges in scenarios with varying hardware, nonlinear distortions and long latency.<n>Progressive learning is employed to improve AEC augmentation effectiveness resulting in a considerable improvement in speech quality.
arXiv Detail & Related papers (2025-08-11T02:45:31Z) - Deep Active Speech Cancellation with Mamba-Masking Network [62.73250985838971]
We present a novel deep learning network for Active Speech Cancellation (ASC)<n>The proposed Mamba-Masking architecture introduces a masking mechanism that directly interacts with the encoded reference signal.<n> Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC.
arXiv Detail & Related papers (2025-02-03T09:22:26Z) - SecONN: An Optical Neural Network Framework with Concurrent Detection of Thermal Fault Injection Attacks [0.7262345640500065]
This paper first proposes a threat of thermal fault injection attacks on SPAAs based on Vector-Matrix Multipliers (VMMs) utilizing Mach-Zhender Interferometers.
This paper then proposes SecONN, an optical neural network framework that is capable of not only inferences but also concurrent detection of the attacks.
arXiv Detail & Related papers (2024-11-22T05:31:36Z) - Adversarial Purification for Data-Driven Power System Event Classifiers
with Diffusion Models [0.8848340429852071]
Global deployment of phasor measurement units (PMUs) enables real-time monitoring of the power system.
Recent studies reveal that machine learning-based methods are vulnerable to adversarial attacks.
This paper proposes an effective adversarial purification method based on the diffusion model to counter adversarial attacks.
arXiv Detail & Related papers (2023-11-13T06:52:56Z) - Leveraging Domain Features for Detecting Adversarial Attacks Against
Deep Speech Recognition in Noise [18.19207291891767]
adversarial attacks against deep ASR systems are highly successful.
This work leverages filter bank-based features to better capture the characteristics of attacks for improved detection.
Inverse filter bank features generally perform better in both clean and noisy environments.
arXiv Detail & Related papers (2022-11-03T07:25:45Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.