Hear No Evil: Towards Adversarial Robustness of Automatic Speech
Recognition via Multi-Task Learning
- URL: http://arxiv.org/abs/2204.02381v1
- Date: Tue, 5 Apr 2022 17:40:19 GMT
- Title: Hear No Evil: Towards Adversarial Robustness of Automatic Speech
Recognition via Multi-Task Learning
- Authors: Nilaksh Das, Duen Horng Chau
- Abstract summary: We investigate the impact of performing multi-task learning on the adversarial robustness of ASR models in the speech domain.
Our approach shows considerable absolute improvements in adversarially targeted WER ranging from 17.25 up to 59.90.
Ours is the first in-depth study that uncovers adversarial robustness gains from multi-task learning for ASR.
- Score: 13.735883484044166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As automatic speech recognition (ASR) systems are now being widely deployed
in the wild, the increasing threat of adversarial attacks raises serious
questions about the security and reliability of using such systems. On the
other hand, multi-task learning (MTL) has shown success in training models that
can resist adversarial attacks in the computer vision domain. In this work, we
investigate the impact of performing such multi-task learning on the
adversarial robustness of ASR models in the speech domain. We conduct extensive
MTL experimentation by combining semantically diverse tasks such as accent
classification and ASR, and evaluate a wide range of adversarial settings. Our
thorough analysis reveals that performing MTL with semantically diverse tasks
consistently makes it harder for an adversarial attack to succeed. We also
discuss in detail the serious pitfalls and their related remedies that have a
significant impact on the robustness of MTL models. Our proposed MTL approach
shows considerable absolute improvements in adversarially targeted WER ranging
from 17.25 up to 59.90 compared to single-task learning baselines (attention
decoder and CTC respectively). Ours is the first in-depth study that uncovers
adversarial robustness gains from multi-task learning for ASR.
Related papers
- AI Safety in Practice: Enhancing Adversarial Robustness in Multimodal Image Captioning [0.0]
Multimodal machine learning models that combine visual and textual data are increasingly being deployed in critical applications.
This paper presents an effective strategy to enhance the robustness of multimodal image captioning models against adversarial attacks.
arXiv Detail & Related papers (2024-07-30T20:28:31Z) - Detecting and Understanding Vulnerabilities in Language Models via Mechanistic Interpretability [44.99833362998488]
Large Language Models (LLMs) have shown impressive performance across a wide range of tasks.
LLMs in particular are known to be vulnerable to adversarial attacks, where an imperceptible change to the input can mislead the output of the model.
We propose a method, based on Mechanistic Interpretability (MI) techniques, to guide this process.
arXiv Detail & Related papers (2024-07-29T09:55:34Z) - VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context.
Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities.
Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images.
We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z) - On the Robustness of Large Multimodal Models Against Image Adversarial
Attacks [81.2935966933355]
We study the impact of visual adversarial attacks on Large Multimodal Models (LMMs)
We find that in general LMMs are not robust to visual adversarial inputs.
We propose a new approach to real-world image classification which we term query decomposition.
arXiv Detail & Related papers (2023-12-06T04:59:56Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - Visual Adversarial Examples Jailbreak Aligned Large Language Models [66.53468356460365]
We show that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks.
We exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision.
Our study underscores the escalating adversarial risks associated with the pursuit of multimodality.
arXiv Detail & Related papers (2023-06-22T22:13:03Z) - Multi-Task Models Adversarial Attacks [25.834775498006657]
Multi-Task Learning involves developing a singular model, known as a multi-task model, to concurrently perform multiple tasks.
The security of single-task models has been thoroughly studied, but multi-task models pose several critical security questions.
This paper addresses these queries through detailed analysis and rigorous experimentation.
arXiv Detail & Related papers (2023-05-20T03:07:43Z) - Learning Transferable Adversarial Robust Representations via Multi-view
Consistency [57.73073964318167]
We propose a novel meta-adversarial multi-view representation learning framework with dual encoders.
We demonstrate the effectiveness of our framework on few-shot learning tasks from unseen domains.
arXiv Detail & Related papers (2022-10-19T11:48:01Z) - SkeleVision: Towards Adversarial Resiliency of Person Tracking with
Multi-Task Learning [12.245882404444881]
We study the impact of multi-task learning (MTL) on the adversarial robustness of the widely used SiamRPN tracker.
Specifically, we investigate the effect of jointly learning with semantically analogous tasks of person tracking and human keypoint detection.
Our empirical study with simulated as well as real-world datasets reveals that training with MTL consistently makes it harder to attack the SiamRPN tracker.
arXiv Detail & Related papers (2022-04-02T01:21:09Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.