Related papers: Adversarial Robustness in One-Stage Learning-to-Defer

Adversarial Robustness in One-Stage Learning-to-Defer

URL: http://arxiv.org/abs/2510.10988v1
Date: Mon, 13 Oct 2025 03:55:55 GMT
Title: Adversarial Robustness in One-Stage Learning-to-Defer
Authors: Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi,
Abstract summary: Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts.<n>While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions.<n>We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression.
Score: 7.413102772934999
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

Related papers

Distributionally Robust Optimization with Adversarial Data Contamination [49.89480853499918]
We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions.<n>Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts.<n>This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
arXiv Detail & Related papers (2025-07-14T18:34:10Z)
Improving LLM Safety Alignment with Dual-Objective Optimization [81.98466438000086]
Existing training-time safety alignment techniques for large language models (LLMs) remain vulnerable to jailbreak attacks.<n>We propose an improved safety alignment that disentangles DPO objectives into two components: (1) robust refusal training, which encourages refusal even when partial unsafe generations are produced, and (2) targeted unlearning of harmful knowledge.
arXiv Detail & Related papers (2025-03-05T18:01:05Z)
Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees [6.792743621449621]
Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts.<n>Existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation.<n>We present the first comprehensive study of adversarial robustness in two-stage L2D systems.
arXiv Detail & Related papers (2025-02-03T03:44:35Z)
Efficient Adversarial Training in LLMs with Continuous Attacks [99.5882845458567]
Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses. C-AdvIPO is an adversarial variant of IPO that does not require utility data for adversarially robust alignment.
arXiv Detail & Related papers (2024-05-24T14:20:09Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Boosting Adversarial Robustness using Feature Level Stochastic Smoothing [46.86097477465267]
adversarial defenses have led to a significant improvement in the robustness of Deep Neural Networks. In this work, we propose a generic method for introducingity in the network predictions. We also utilize this for smoothing decision rejecting low confidence predictions.
arXiv Detail & Related papers (2023-06-10T15:11:24Z)
Improving Adversarial Robustness to Sensitivity and Invariance Attacks with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample. We use metric learning to frame adversarial regularization as an optimal transport problem. Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z)
Certifiably-Robust Federated Adversarial Learning via Randomized Smoothing [16.528628447356496]
In this paper, we incorporate smoothing techniques into federated adversarial training to enable data-private distributed learning. Our experiments show that such an advanced federated adversarial learning framework can deliver models as robust as those trained by the centralized training.
arXiv Detail & Related papers (2021-03-30T02:19:45Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.