Designing Robust Transformers using Robust Kernel Density Estimation
- URL: http://arxiv.org/abs/2210.05794v3
- Date: Wed, 8 Nov 2023 14:50:22 GMT
- Title: Designing Robust Transformers using Robust Kernel Density Estimation
- Authors: Xing Han and Tongzheng Ren and Tan Minh Nguyen and Khai Nguyen and
Joydeep Ghosh and Nhat Ho
- Abstract summary: We introduce a series of self-attention mechanisms that can be incorporated into different Transformer architectures.
We then perform extensive empirical studies on language modeling and image classification tasks.
- Score: 47.494628752936165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in Transformer architectures have empowered their empirical
success in a variety of tasks across different domains. However, existing works
mainly focus on predictive accuracy and computational cost, without considering
other practical issues, such as robustness to contaminated samples. Recent work
by Nguyen et al., (2022) has shown that the self-attention mechanism, which is
the center of the Transformer architecture, can be viewed as a non-parametric
estimator based on kernel density estimation (KDE). This motivates us to
leverage a set of robust kernel density estimation methods for alleviating the
issue of data contamination. Specifically, we introduce a series of
self-attention mechanisms that can be incorporated into different Transformer
architectures and discuss the special properties of each method. We then
perform extensive empirical studies on language modeling and image
classification tasks. Our methods demonstrate robust performance in multiple
scenarios while maintaining competitive results on clean datasets.
Related papers
- Reward driven workflows for unsupervised explainable analysis of phases and ferroic variants from atomically resolved imaging data [14.907891992968361]
We show that a reward-driven approach can be used to optimize key hyper parameters in unsupervised ML methods.
This approach allows us to discover local descriptors that are best aligned with the specific physical behavior.
We also extend the reward driven to disentangle structural factors of variation via variational autoencoder (VAE)
arXiv Detail & Related papers (2024-11-19T16:18:20Z) - Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence [92.07601770031236]
We investigate semantically meaningful patterns in the attention heads of an encoder-only Transformer architecture.
We find that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization.
arXiv Detail & Related papers (2024-09-20T07:41:47Z) - Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process [8.32027826756131]
The proposed framework is demonstrated and analyzed on three engineering case studies.
It provides improved predictive accuracy over a single source model and transformed but source unaware model.
arXiv Detail & Related papers (2024-07-15T22:27:04Z) - Compressing Image-to-Image Translation GANs Using Local Density
Structures on Their Learned Manifold [69.33930972652594]
Generative Adversarial Networks (GANs) have shown remarkable success in modeling complex data distributions for image-to-image translation.
Existing GAN compression methods mainly rely on knowledge distillation or convolutional classifiers' pruning techniques.
We propose a new approach by explicitly encouraging the pruned model to preserve the density structure of the original parameter-heavy model on its learned manifold.
Our experiments on image translation GAN models, Pix2Pix and CycleGAN, with various benchmark datasets and architectures demonstrate our method's effectiveness.
arXiv Detail & Related papers (2023-12-22T15:43:12Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers.
We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images.
Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Transformer Uncertainty Estimation with Hierarchical Stochastic
Attention [8.95459272947319]
We propose a novel way to enable transformers to have the capability of uncertainty estimation.
This is achieved by learning a hierarchical self-attention that attends to values and a set of learnable centroids.
We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets.
arXiv Detail & Related papers (2021-12-27T16:43:31Z) - Multi-Facet Clustering Variational Autoencoders [9.150555507030083]
High-dimensional data, such as images, typically feature multiple interesting characteristics one could cluster over.
We introduce Multi-Facet Clustering Variational Autoencoders (MFCVAE)
MFCVAE learns multiple clusterings simultaneously, and is trained fully unsupervised and end-to-end.
arXiv Detail & Related papers (2021-06-09T17:36:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.