Related papers: On the Adversarial Robustness of Discrete Image Tokenizers

On the Adversarial Robustness of Discrete Image Tokenizers

URL: http://arxiv.org/abs/2602.18252v1
Date: Fri, 20 Feb 2026 14:39:17 GMT
Title: On the Adversarial Robustness of Discrete Image Tokenizers
Authors: Rishika Bhagwatkar, Irina Rish, Nicolas Flammarion, Francesco Croce,
Abstract summary: We first formulate attacks that aim to perturb the features extracted by discrete tokenizers, and thus change the extracted tokens.<n>We fine-tune popular tokenizers with unsupervised adversarial training, keeping all other components frozen.<n>Our approach significantly improves robustness to both unsupervised and end-to-end supervised attacks and generalizes well to unseen tasks and data.
Score: 56.377796750281796
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Discrete image tokenizers encode visual inputs as sequences of tokens from a finite vocabulary and are gaining popularity in multimodal systems, including encoder-only, encoder-decoder, and decoder-only models. However, unlike CLIP encoders, their vulnerability to adversarial attacks has not been explored. Ours being the first work studying this topic, we first formulate attacks that aim to perturb the features extracted by discrete tokenizers, and thus change the extracted tokens. These attacks are computationally efficient, application-agnostic, and effective across classification, multimodal retrieval, and captioning tasks. Second, to defend against this vulnerability, inspired by recent work on robust CLIP encoders, we fine-tune popular tokenizers with unsupervised adversarial training, keeping all other components frozen. While unsupervised and task-agnostic, our approach significantly improves robustness to both unsupervised and end-to-end supervised attacks and generalizes well to unseen tasks and data. Unlike supervised adversarial training, our approach can leverage unlabeled images, making it more versatile. Overall, our work highlights the critical role of tokenizer robustness in downstream tasks and presents an important step in the development of safe multimodal foundation models.

Related papers

PA-Attack: Guiding Gray-Box Attacks on LVLM Vision Encoders with Prototypes and Attention [63.63231191403825]
Large Vision-Language Models (LVLMs) are foundational to modern multimodal applications, yet their susceptibility to adversarial attacks remains a critical concern.<n>We introduce PA-Attack (Prototype-Anchored Attentive Attack) to tackle the attribute-restricted issue and limited task generalization of vanilla attacks.<n>Experiments show that PA-Attack achieves an average 75.1% score reduction rate (SRR), demonstrating strong attack effectiveness, efficiency, and task generalization in LVLMs.
arXiv Detail & Related papers (2026-02-23T01:20:43Z)
ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models [59.47738955960352]
ToDRE is a two-stage and training-free token compression framework.<n>It achieves superior performance by pruning tokens based on token Diversity and token-task RElevance.
arXiv Detail & Related papers (2025-05-24T15:47:49Z)
Adversarial Robustness for Unified Multi-Modal Encoders via Efficient Calibration [12.763688592842717]
We present the first comprehensive study of adversarial vulnerability in unified multi-modal encoders.<n>Non-visual inputs, such as audio and point clouds, are especially fragile.<n>Our method improves adversarial robustness by up to 47.3 percent at epsilon = 4/255.
arXiv Detail & Related papers (2025-05-17T08:26:04Z)
Manipulating Multimodal Agents via Cross-Modal Prompt Injection [34.35145839873915]
We identify a critical yet previously overlooked security vulnerability in multimodal agents.<n>We propose CrossInject, a novel attack framework in which attackers embed adversarial perturbations across multiple modalities.<n>Our method outperforms state-of-the-art attacks, achieving at least a +30.1% increase in attack success rates.
arXiv Detail & Related papers (2025-04-19T16:28:03Z)
Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks [9.207022068713867]
We present a comprehensive empirical evaluation of the adversarial robustness of self-supervised vision encoders across multiple downstream tasks. Our attacks operate in the encoder embedding space and at the downstream task output level. Since the purpose of a foundation model is to cater to multiple applications at once, our findings reveal the need to enhance encoder robustness more broadly.
arXiv Detail & Related papers (2024-07-17T14:12:34Z)
Language-Driven Anchors for Zero-Shot Adversarial Robustness [25.160195547250655]
We propose a Language-driven, Anchor-based Adversarial Training strategy. By leveraging the semantic consistency of the text encoders, LAAT aims to enhance the adversarial robustness of the image model. We show that LAAT significantly improves zero-shot adversarial robustness over state-of-the-art methods.
arXiv Detail & Related papers (2023-01-30T17:34:43Z)
Learning Transferable Adversarial Robust Representations via Multi-view Consistency [57.73073964318167]
We propose a novel meta-adversarial multi-view representation learning framework with dual encoders. We demonstrate the effectiveness of our framework on few-shot learning tasks from unseen domains.
arXiv Detail & Related papers (2022-10-19T11:48:01Z)
Robustness of Unsupervised Representation Learning without Labels [92.90480374344777]
We propose a family of unsupervised robustness measures, which are model- and task-agnostic and label-free. We validate our results against a linear probe and show that, for MOCOv2, adversarial training results in 3 times higher certified accuracy.
arXiv Detail & Related papers (2022-10-08T18:03:28Z)
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data. We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z)
Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders [23.2869445054295]
Self-supervised representation learning techniques encode images into rich features that are oblivious to downstream tasks. The requirements for dedicated model designs and a massive amount of resources expose image encoders to the risks of potential model stealing attacks. We propose Cont-Steal, a contrastive-learning-based attack, and validate its improved stealing effectiveness in various experiment settings.
arXiv Detail & Related papers (2022-01-19T10:27:28Z)
Detection of Adversarial Supports in Few-shot Classifiers Using Feature Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets. We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection. Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.