Robust Weight Signatures: Gaining Robustness as Easy as Patching
Weights?
- URL: http://arxiv.org/abs/2302.12480v1
- Date: Fri, 24 Feb 2023 06:44:19 GMT
- Title: Robust Weight Signatures: Gaining Robustness as Easy as Patching
Weights?
- Authors: Ruisi Cai, Zhenyu Zhang, Zhangyang Wang
- Abstract summary: Given a robust model trained to be resilient to one or multiple types of distribution shifts, how is that "robustness" encoded in the model weights?
We propose a minimalistic model robustness "patching" framework that carries a model trained on clean data together with its pre-extracted RWSs.
In this way, injecting certain robustness to the model is reduced to directly adding the corresponding RWS to its weight.
- Score: 81.77457373726736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a robust model trained to be resilient to one or multiple types of
distribution shifts (e.g., natural image corruptions), how is that "robustness"
encoded in the model weights, and how easily can it be disentangled and/or
"zero-shot" transferred to some other models? This paper empirically suggests a
surprisingly simple answer: linearly - by straightforward model weight
arithmetic! We start by drawing several key observations: (1)assuming that we
train the same model architecture on both a clean dataset and its corrupted
version, resultant weights mostly differ in shallow layers; (2)the weight
difference after projection, which we call "Robust Weight Signature" (RWS),
appears to be discriminative and indicative of different corruption types;
(3)for the same corruption type, the RWSs obtained by one model architecture
are highly consistent and transferable across different datasets.
We propose a minimalistic model robustness "patching" framework that carries
a model trained on clean data together with its pre-extracted RWSs. In this
way, injecting certain robustness to the model is reduced to directly adding
the corresponding RWS to its weight. We verify our proposed framework to be
remarkably (1)lightweight. since RWSs concentrate on the shallowest few layers
and we further show they can be painlessly quantized, storing an RWS is up to
13 x more compact than storing the full weight copy; (2)in-situ adjustable.
RWSs can be appended as needed and later taken off to restore the intact clean
model. We further demonstrate one can linearly re-scale the RWS to control the
patched robustness strength; (3)composable. Multiple RWSs can be added
simultaneously to patch more comprehensive robustness at once; and
(4)transferable. Even when the clean model backbone is continually adapted or
updated, RWSs remain as effective patches due to their outstanding
cross-dataset transferability.
Related papers
- Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers [4.015569252776372]
ArchonView is a method that significantly exceeds state-of-the-art methods despite being trained from scratch with 3D rendering data only and no 2D pretraining.
Our model also exhibits robust performance even for difficult camera poses where previous methods fail, and is several times faster in inference speed compared to diffusion.
arXiv Detail & Related papers (2025-03-17T17:59:59Z) - Efficient and Versatile Robust Fine-Tuning of Zero-shot Models [34.27380518351181]
We introduce Robust Adapter (R-Adapter), a novel method for fine-tuning zero-shot models to downstream tasks.
Our method integrates lightweight modules into the pre-trained model and employs novel self-ensemble techniques to boost OOD robustness and reduce storage expenses substantially.
Our experiments demonstrate that R-Adapter achieves state-of-the-art performance across a diverse set of tasks, tuning only 13% of the parameters of the CLIP encoders.
arXiv Detail & Related papers (2024-08-11T11:37:43Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Verifix: Post-Training Correction to Improve Label Noise Robustness with
Verified Samples [9.91998873101083]
Post-Training Correction adjusts model parameters after initial training to mitigate label noise.
We introduce Verifix, a novel algorithm that leverages a small, verified dataset to correct the model weights using a single update.
Experiments on the CIFAR dataset with 25% synthetic corruption show 7.36% generalization improvements on average.
arXiv Detail & Related papers (2024-03-13T15:32:08Z) - Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts [22.74552390076515]
We probe the representation spaces of 16 robust zero-shot CLIP vision encoders with various backbones and pretraining sets.
We detect the presence of outlier features in robust zero-shot CLIP vision encoders, which to the best of our knowledge is the first time these are observed in non-transformer models.
We find the existence of outlier features to be an indication of ImageNet shift robustness in models, since we only find them in robust models in our analysis.
arXiv Detail & Related papers (2023-10-19T17:59:12Z) - Less is More: On the Feature Redundancy of Pretrained Models When
Transferring to Few-shot Tasks [120.23328563831704]
Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data.
We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce.
arXiv Detail & Related papers (2023-10-05T19:00:49Z) - RoMa: Robust Dense Feature Matching [17.015362716393216]
Feature matching is an important computer vision task that involves estimating correspondences between two images of a 3D scene.
We propose a model, leveraging frozen pretrained features from the foundation model DINOv2.
To further improve robustness, we propose a tailored transformer match decoder.
arXiv Detail & Related papers (2023-05-24T17:59:04Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels [56.81761908354718]
We propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels.
Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline.
We further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data.
arXiv Detail & Related papers (2023-01-02T07:13:28Z) - FlexiViT: One Model for All Patch Sizes [100.52574011880571]
Vision Transformers convert images to sequences by slicing them into patches.
The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost.
We show that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes.
arXiv Detail & Related papers (2022-12-15T18:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.