Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation
- URL: http://arxiv.org/abs/2303.11611v2
- Date: Mon, 18 Dec 2023 13:05:59 GMT
- Title: Out of Thin Air: Exploring Data-Free Adversarial Robustness Distillation
- Authors: Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Pinxue Guo, Kaixun Jiang,
Wenqiang Zhang, Lizhe Qi
- Abstract summary: We propose Data-Free Adversarial Robustness Distillation (DFARD) to train small, easily deployable, robust models without relying on data.
Inspired by human education, we design a plug-and-play Interactive Temperature Adjustment (ITA) strategy to improve the efficiency of knowledge transfer.
- Score: 26.744403789694758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial Robustness Distillation (ARD) is a promising task to solve the
issue of limited adversarial robustness of small capacity models while
optimizing the expensive computational costs of Adversarial Training (AT).
Despite the good robust performance, the existing ARD methods are still
impractical to deploy in natural high-security scenes due to these methods rely
entirely on original or publicly available data with a similar distribution. In
fact, these data are almost always private, specific, and distinctive for
scenes that require high robustness. To tackle these issues, we propose a
challenging but significant task called Data-Free Adversarial Robustness
Distillation (DFARD), which aims to train small, easily deployable, robust
models without relying on data. We demonstrate that the challenge lies in the
lower upper bound of knowledge transfer information, making it crucial to
mining and transferring knowledge more efficiently. Inspired by human
education, we design a plug-and-play Interactive Temperature Adjustment (ITA)
strategy to improve the efficiency of knowledge transfer and propose an
Adaptive Generator Balance (AGB) module to retain more data information. Our
method uses adaptive hyperparameters to avoid a large number of parameter
tuning, which significantly outperforms the combination of existing techniques.
Meanwhile, our method achieves stable and reliable performance on multiple
benchmarks.
Related papers
- Improving Noise Efficiency in Privacy-preserving Dataset Distillation [59.57846442477106]
We introduce a novel framework that decouples sampling from optimization for better convergence and improves signal quality.<n>On CIFAR-10, our method achieves a textbf10.0% improvement with 50 images per class and textbf8.3% increase with just textbfone-fifth the distilled set size of previous state-of-the-art methods.
arXiv Detail & Related papers (2025-08-03T13:15:52Z) - Lightweight Task-Oriented Semantic Communication Empowered by Large-Scale AI Models [66.57755931421285]
Large-scale artificial intelligence (LAI) models pose significant challenges for real-time communication scenarios.<n>This paper proposes utilizing knowledge distillation (KD) techniques to extract and condense knowledge from LAI models.<n>We propose a fast distillation method featuring a pre-stored compression mechanism that eliminates the need for repetitive inference.
arXiv Detail & Related papers (2025-06-16T08:42:16Z) - Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay [61.823835392216544]
Reinforcement learning (RL) has become an effective approach for fine-tuning large language models (LLMs)<n>We propose two techniques to improve data efficiency in LLM RL fine-tuning: difficulty-targeted online data selection and rollout replay.<n>Our method reduces RL fine-tuning time by 25% to 65% to reach the same level of performance as the original GRPO algorithm.
arXiv Detail & Related papers (2025-06-05T17:55:43Z) - VAE-based Feature Disentanglement for Data Augmentation and Compression in Generalized GNSS Interference Classification [42.14439854721613]
We propose variational autoencoders (VAEs) for disentanglement to extract essential latent features that enable accurate classification of interferences.
Our proposed VAE achieves a data compression rate ranging from 512 to 8,192 and achieves an accuracy up to 99.92%.
arXiv Detail & Related papers (2025-04-14T13:38:00Z) - LAPD: Langevin-Assisted Bayesian Active Learning for Physical Discovery [5.373182035720355]
Langevin-Assisted Active Physical Discovery (LAPD) is a framework that integrates replica-exchange gradient Langevin Monte Carlo.
LAPD achieves reliable, uncertainty-aware identification with noisy data.
We evaluate LAPD on diverse nonlinear systems such as the Lotka-Volterra, Lorenz, Burgers, and Convection-Diffusion equations.
arXiv Detail & Related papers (2025-03-04T20:17:24Z) - Towards Robust Stability Prediction in Smart Grids: GAN-based Approach under Data Constraints and Adversarial Challenges [53.2306792009435]
We introduce a novel framework to detect instability in smart grids by employing only stable data.
It relies on a Generative Adversarial Network (GAN) where the generator is trained to create instability data that are used along with stable data to train the discriminator.
Our solution, tested on a dataset composed of real-world stable and unstable samples, achieve accuracy up to 97.5% in predicting grid stability and up to 98.9% in detecting adversarial attacks.
arXiv Detail & Related papers (2025-01-27T20:48:25Z) - A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation [0.562479170374811]
In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift.
Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data.
We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness.
arXiv Detail & Related papers (2024-11-23T17:35:23Z) - Hammer: Towards Efficient Hot-Cold Data Identification via Online Learning [12.60117987781744]
Management of storage resources in big data and cloud computing environments requires accurate identification of data's "cold" and "hot" states.
Traditional methods, such as rule-based algorithms and early AI techniques, often struggle with dynamic workloads.
We propose a novel solution based on online learning strategies, achieving higher accuracy and lower operational costs.
arXiv Detail & Related papers (2024-11-22T06:46:16Z) - Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.
This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.
We propose two novel techniques for robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability [73.34532767873785]
We propose the concept of Information Density'' (ID) to indicate whether a matrix strongly belongs to certain feature spaces.
We introduce the Dense Information Prompt (DIP) to enhance information density to improve generalization.
DIP significantly reduces the number of tunable parameters and the requisite storage space, making it particularly advantageous in resource-constrained settings.
arXiv Detail & Related papers (2023-12-17T20:42:43Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Fast Machine Unlearning Without Retraining Through Selective Synaptic
Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data.
We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z) - SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models [28.764782216513037]
Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning.
We propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios.
Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning.
arXiv Detail & Related papers (2023-08-12T10:33:57Z) - Annealing Self-Distillation Rectification Improves Adversarial Training [0.10241134756773226]
We analyze the characteristics of robust models and identify that robust models tend to produce smoother and well-calibrated outputs.
We propose Annealing Self-Distillation Rectification, which generates soft labels as a better guidance mechanism.
We demonstrate the efficacy of ADR through extensive experiments and strong performances across datasets.
arXiv Detail & Related papers (2023-05-20T06:35:43Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.