An Integrated Algorithm for Robust and Imperceptible Audio Adversarial
Examples
- URL: http://arxiv.org/abs/2310.03349v1
- Date: Thu, 5 Oct 2023 06:59:09 GMT
- Title: An Integrated Algorithm for Robust and Imperceptible Audio Adversarial
Examples
- Authors: Armin Ettenhofer and Jan-Philipp Schulze and Karla Pizzi
- Abstract summary: A viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness.
We present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step.
- Score: 2.2866551516539726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio adversarial examples are audio files that have been manipulated to fool
an automatic speech recognition (ASR) system, while still sounding benign to a
human listener. Most methods to generate such samples are based on a two-step
algorithm: first, a viable adversarial audio file is produced, then, this is
fine-tuned with respect to perceptibility and robustness. In this work, we
present an integrated algorithm that uses psychoacoustic models and room
impulse responses (RIR) in the generation step. The RIRs are dynamically
created by a neural network during the generation process to simulate a
physical environment to harden our examples against transformations experienced
in over-the-air attacks. We compare the different approaches in three
experiments: in a simulated environment and in a realistic over-the-air
scenario to evaluate the robustness, and in a human study to evaluate the
perceptibility. Our algorithms considering psychoacoustics only or in addition
to the robustness show an improvement in the signal-to-noise ratio (SNR) as
well as in the human perception study, at the cost of an increased word error
rate (WER).
Related papers
- Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems [0.3277163122167434]
We propose a novel method for constructing a realistic training set that includes mixture signals and corresponding ground truths for each speaker.
We get a 1.65 dB improvement in Scale Invariant Signal to Distortion Ratio (SI-SDR) for speaker separation accuracy in realistic mixing.
arXiv Detail & Related papers (2024-11-13T06:55:18Z) - Reassessing Noise Augmentation Methods in the Context of Adversarial Speech [12.488332326259469]
We investigate if noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition systems.
The results demonstrate that noise augmentation not only improves model performance on noisy speech but also the model's robustness to adversarial attacks.
arXiv Detail & Related papers (2024-09-03T11:51:10Z) - DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs [12.234206036041218]
We use the deep neural network (DNN) DeepSpeech2 as a paradigm to investigate how natural input and cochlear implant-based inputs are processed over time.
We generate naturalistic and cochlear implant-like inputs from spoken sentences and test the similarity of model performance to human performance.
We find that dynamics over time in each layer are affected by context as well as input type.
arXiv Detail & Related papers (2024-07-30T04:32:27Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z) - Perlin Noise Improve Adversarial Robustness [9.084544535198509]
Adversarial examples are some special input that can perturb the output of a deep neural network.
Most of the present methods for generating adversarial examples require gradient information.
Procedural noise adversarial examples is a new way of adversarial example generation.
arXiv Detail & Related papers (2021-12-26T15:58:28Z) - Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition
Systems [1.599072005190786]
Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances.
Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations.
To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
arXiv Detail & Related papers (2021-12-03T10:21:47Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.