Algorithms for Adversarially Robust Deep Learning
- URL: http://arxiv.org/abs/2509.19100v1
- Date: Tue, 23 Sep 2025 14:48:58 GMT
- Title: Algorithms for Adversarially Robust Deep Learning
- Authors: Alexander Robey,
- Abstract summary: We discuss recent progress toward designing algorithms that exhibit desirable robustness properties.<n>We present new algorithms that achieve state-of-the-art generalization in medical imaging, molecular identification, and image classification.<n>We propose new attacks and defenses, which represent the frontier of progress toward designing robust language-based agents.
- Score: 58.656107500646364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the widespread use of deep learning models in safety-critical applications, ensuring that the decisions of such models are robust against adversarial exploitation is of fundamental importance. In this thesis, we discuss recent progress toward designing algorithms that exhibit desirable robustness properties. First, we discuss the problem of adversarial examples in computer vision, for which we introduce new technical results, training paradigms, and certification algorithms. Next, we consider the problem of domain generalization, wherein the task is to train neural networks to generalize from a family of training distributions to unseen test distributions. We present new algorithms that achieve state-of-the-art generalization in medical imaging, molecular identification, and image classification. Finally, we study the setting of jailbreaking large language models (LLMs), wherein an adversarial user attempts to design prompts that elicit objectionable content from an LLM. We propose new attacks and defenses, which represent the frontier of progress toward designing robust language-based agents.
Related papers
- Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Topological safeguard for evasion attack interpreting the neural
networks' behavior [0.0]
In this work, a novel detector of evasion attacks is developed.
It focuses on the information of the activations of the neurons given by the model when an input sample is injected.
For this purpose, a huge data preprocessing is required to introduce all this information in the detector.
arXiv Detail & Related papers (2024-02-12T08:39:40Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - Mitigating Adversarial Attacks in Deepfake Detection: An Exploration of
Perturbation and AI Techniques [1.0718756132502771]
adversarial examples are subtle perturbations artfully injected into clean images or videos.
Deepfakes have emerged as a potent tool to manipulate public opinion and tarnish the reputations of public figures.
This article delves into the multifaceted world of adversarial examples, elucidating the underlying principles behind their capacity to deceive deep learning algorithms.
arXiv Detail & Related papers (2023-02-22T23:48:19Z) - Graph Neural Networks for Decentralized Multi-Agent Perimeter Defense [111.9039128130633]
We develop an imitation learning framework that learns a mapping from defenders' local perceptions and their communication graph to their actions.
We run perimeter defense games in scenarios with different team sizes and configurations to demonstrate the performance of the learned network.
arXiv Detail & Related papers (2023-01-23T19:35:59Z) - Adversarial Machine Learning In Network Intrusion Detection Domain: A
Systematic Review [0.0]
It has been found that deep learning models are vulnerable to data instances that can mislead the model to make incorrect classification decisions.
This survey explores the researches that employ different aspects of adversarial machine learning in the area of network intrusion detection.
arXiv Detail & Related papers (2021-12-06T19:10:23Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z) - A Generative Model based Adversarial Security of Deep Learning and
Linear Classifier Models [0.0]
We have proposed a mitigation method for adversarial attacks against machine learning models with an autoencoder model.
The main idea behind adversarial attacks against machine learning models is to produce erroneous results by manipulating trained models.
We have also presented the performance of autoencoder models to various attack methods from deep neural networks to traditional algorithms.
arXiv Detail & Related papers (2020-10-17T17:18:17Z) - A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms.
We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z) - Opportunities and Challenges in Deep Learning Adversarial Robustness: A
Survey [1.8782750537161614]
This paper studies strategies to implement adversary robustly trained algorithms towards guaranteeing safety in machine learning algorithms.
We provide a taxonomy to classify adversarial attacks and defenses, formulate the Robust Optimization problem in a min-max setting, and divide it into 3 subcategories, namely: Adversarial (re)Training, Regularization Approach, and Certified Defenses.
arXiv Detail & Related papers (2020-07-01T21:00:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.