Related papers: LetheViT: Selective Machine Unlearning for Vision Transformers via Attention-Guided Contrastive Learning

LetheViT: Selective Machine Unlearning for Vision Transformers via Attention-Guided Contrastive Learning

URL: http://arxiv.org/abs/2508.01569v1
Date: Sun, 03 Aug 2025 03:37:31 GMT
Title: LetheViT: Selective Machine Unlearning for Vision Transformers via Attention-Guided Contrastive Learning
Authors: Yujia Tong, Tian Zhang, Jingling Yuan, Yuze Wang, Chuang Hu,
Abstract summary: Vision Transformers (ViTs) have revolutionized computer vision tasks with their exceptional performance.<n>This work addresses the particularly challenging scenario of random data forgetting in ViTs.<n>We propose LetheViT, a contrastive unlearning method tailored for ViTs.
Score: 8.104991333199264
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision Transformers (ViTs) have revolutionized computer vision tasks with their exceptional performance. However, the introduction of privacy regulations such as GDPR and CCPA has brought new challenges to them. These laws grant users the right to withdraw their data, necessitating not only the deletion of data but also the complete removal of its influence from trained models. Machine unlearning emerges as a critical solution, with exact unlearning being computationally prohibitive and approximate methods offering a more practical approach. This work addresses the particularly challenging scenario of random data forgetting in ViTs, where the model must forget specific samples while retaining others, even within the same class. We first reveal the core characteristics of ViTs through selective masking experiments: when high-attention areas are masked, the model retains its recognition capability but significantly weakens its memorization ability. Based on the above insights, we propose LetheViT, a contrastive unlearning method tailored for ViTs. LetheViT uses masked image inputs to generate positive logits and original image inputs to generate negative logits, guiding the model to forget specific details while retaining the general cl category outlines. Experimental results demonstrate that LetheViT achieves state-of-the-art performance, effectively balancing privacy compliance with model efficacy.

Related papers

SVD-ViT: Does SVD Make Vision Transformers Attend More to the Foreground? [17.159633200689225]
Vision Transformers (ViT) have been established as large-scale foundation models.<n>We propose SVD-ViT, which prioritizes the learning of foreground features.<n> Experimental results demonstrate that our method improves classification accuracy and effectively learns informative foreground representations.
arXiv Detail & Related papers (2026-02-02T20:17:34Z)
ReViP: Reducing False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance [50.05984919728878]
We present ReViP, a novel VLA framework with Vision-Proprioception Rebalance to enhance visual grounding and robustness under perturbations.<n>Specifically, we use an external VLM as a task-stage observer to extract real-time task-centric visual cues from visual observations.<n>To evaluate false completion, we propose the first False-Completion Benchmark Suite built on LIBERO with controlled settings such as Object-Drop.
arXiv Detail & Related papers (2026-01-23T11:31:07Z)
No Labels Needed: Zero-Shot Image Classification with Collaborative Self-Learning [0.0]
Vision-language models (VLMs) and transfer learning with pre-trained visual models appear as promising techniques to deal with this problem.<n>This paper proposes a novel zero-shot image classification framework that combines a VLM and a pre-trained visual model within a self-learning cycle.
arXiv Detail & Related papers (2025-09-23T12:54:52Z)
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs [98.27348724529257]
We introduce ViCrit (Visual Caption Hallucination Critic), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions.<n>Models trained with the ViCrit Task exhibit substantial gains across a variety of vision-language models benchmarks.
arXiv Detail & Related papers (2025-06-11T19:16:54Z)
On the Surprising Effectiveness of Attention Transfer for Vision Transformers [118.83572030360843]
Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations. We investigate this question and find that the features and representations learned during pre-training are not essential.
arXiv Detail & Related papers (2024-11-14T18:59:40Z)
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization [39.09638432514626]
Vision Transformers (ViTs) are increasingly used in computer vision due to their high performance, but their vulnerability to adversarial attacks is a concern. This study introduces SpecFormer, tailored to fortify ViTs against adversarial attacks, with theoretical underpinnings.
arXiv Detail & Related papers (2024-01-02T14:27:24Z)
Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm. FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z)
Interpretability-Aware Vision Transformer [12.406960223371959]
Vision Transformers (ViTs) have become prominent models for solving various vision tasks.<n>We introduce a novel training procedure that inherently enhances model interpretability.<n>IA-ViT is composed of a feature extractor, a predictor, and an interpreter, which are trained jointly with an interpretability-aware training objective.
arXiv Detail & Related papers (2023-09-14T21:50:49Z)
Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers [1.694405932826705]
Visual transformers (ViT) have emerged as a powerful tool for image classification. In this work, we explore Semi-ViT, a ViT model fine tuned using semi-supervised learning techniques. Our results demonstrate that Semi-ViT outperforms traditional convolutional neural networks (CNN) and ViTs, even when fine-tuned with limited annotated data.
arXiv Detail & Related papers (2023-05-17T07:51:35Z)
Supervised Masked Knowledge Distillation for Few-Shot Transformers [36.46755346410219]
We propose a novel Supervised Masked Knowledge Distillation model (SMKD) for few-shot Transformers. Compared with previous self-supervised methods, we allow intra-class knowledge distillation on both class and patch tokens. Our method with simple design outperforms previous methods by a large margin and achieves a new start-of-the-art.
arXiv Detail & Related papers (2023-03-25T03:31:46Z)
Exploring Efficient Few-shot Adaptation for Vision Transformers [70.91692521825405]
We propose a novel efficient Transformer Tuning (eTT) method that facilitates finetuning ViTs in the Few-shot Learning tasks. Key novelties come from the newly presented Attentive Prefix Tuning (APT) and Domain Residual Adapter (DRA) We conduct extensive experiments to show the efficacy of our model.
arXiv Detail & Related papers (2023-01-06T08:42:05Z)
Rectify ViT Shortcut Learning by Visual Saliency [40.55418820114868]
Shortcut learning is common but harmful to deep learning models. In this work, we propose a novel and effective saliency-guided vision transformer (SGT) model to rectify shortcut learning.
arXiv Detail & Related papers (2022-06-17T05:54:07Z)
Self-Promoted Supervision for Few-Shot Transformer [178.52948452353834]
Self-promoted sUpervisioN (SUN) is a few-shot learning framework for vision transformers (ViTs) SUN pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token. Experiments show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts.
arXiv Detail & Related papers (2022-03-14T12:53:27Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.