Related papers: Quantile Activation: Correcting a Failure Mode of ML Models

Quantile Activation: Correcting a Failure Mode of ML Models

URL: http://arxiv.org/abs/2405.11573v3
Date: Thu, 03 Apr 2025 00:10:12 GMT
Title: Quantile Activation: Correcting a Failure Mode of ML Models
Authors: Aditya Challa, Sravan Danda, Laurent Najman, Snehanshu Saha,
Abstract summary: We propose a simple activation function, quantile activation (QAct) that addresses this problem without significantly increasing computational costs.<n>The proposed quantile activation (QAct) outputs the relative quantile position of neuron activations within their context distribution.<n>We find that this approach unexpectedly outperforms DINOv2 (small), despite DINOv2 being trained with a much larger network and dataset.
Score: 4.035209200949511
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Standard ML models fail to infer the context distribution and suitably adapt. For instance, the learning fails when the underlying distribution is actually a mixture of distributions with contradictory labels. Learning also fails if there is a shift between train and test distributions. Standard neural network architectures like MLPs or CNNs are not equipped to handle this. In this article, we propose a simple activation function, quantile activation (QAct), that addresses this problem without significantly increasing computational costs. The core idea is to "adapt" the outputs of each neuron to its context distribution. The proposed quantile activation (QAct) outputs the relative quantile position of neuron activations within their context distribution, diverging from the direct numerical outputs common in traditional networks. A specific case of the above failure mode is when there is an inherent distribution shift, i.e the test distribution differs slightly from the train distribution. We validate the proposed activation function under covariate shifts, using datasets designed to test robustness against distortions. Our results demonstrate significantly better generalization across distortions compared to conventional classifiers and other adaptive methods, across various architectures. Although this paper presents a proof of concept, we find that this approach unexpectedly outperforms DINOv2 (small), despite DINOv2 being trained with a much larger network and dataset.

Related papers

Learning Neural Networks with Distribution Shift: Efficiently Certifiable Guarantees [13.936051653540144]
We give the first. efficient algorithms for learning neural networks with a distribution shift. We work in the Testable Learning with Distribution Shift framework.
arXiv Detail & Related papers (2025-02-22T00:48:03Z)
Generative Conditional Distributions by Neural (Entropic) Optimal Transport [12.152228552335798]
We introduce a novel neural entropic optimal transport method designed to learn generative models of conditional distributions. Our method relies on the minimax training of two neural networks. Our experiments on real-world datasets show the effectiveness of our algorithm compared to state-of-the-art conditional distribution learning techniques.
arXiv Detail & Related papers (2024-06-04T13:45:35Z)
FedUV: Uniformity and Variance for Heterogeneous Federated Learning [5.9330433627374815]
Federated learning is a promising framework to train neural networks with widely distributed data. Recent work has shown this is due to the final layer of the network being most prone to local bias. We investigate the training dynamics of the classifier by applying SVD to the weights motivated by the observation that freezing weights results in constant singular values.
arXiv Detail & Related papers (2024-02-27T15:53:15Z)
An Upper Bound for the Distribution Overlap Index and Its Applications [18.481370450591317]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions. The proposed bound shows its value in one-class classification and domain shift analysis. Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z)
Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision. Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z)
GFlowOut: Dropout with Generative Flow Networks [76.59535235717631]
Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. GFlowOutleverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks.
arXiv Detail & Related papers (2022-10-24T03:00:01Z)
Understanding the Covariance Structure of Convolutional Filters [86.0964031294896]
Recent ViT-inspired convolutional networks such as ConvMixer and ConvNeXt use large-kernel depthwise convolutions with notable structure. We first observe that such learned filters have highly-structured covariance matrices, and we find that covariances calculated from small networks may be used to effectively initialize a variety of larger networks.
arXiv Detail & Related papers (2022-10-07T15:59:13Z)
Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning [4.97235247328373]
This work addresses meta-learning (ML) by considering deep networks with local winner-takes-all (LWTA) activations. This type of network units results in sparse representations from each model layer, as the units are organized into blocks where only one unit generates a non-zero output. Our approach produces state-of-the-art predictive accuracy on few-shot image classification and regression experiments, as well as reduced predictive error on an active learning setting.
arXiv Detail & Related papers (2022-08-02T16:19:54Z)
Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features. We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z)
Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation. In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN. Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z)
Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions. We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer. In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z)
Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates. We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters. We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z)
KL Guided Domain Adaptation [88.19298405363452]
Domain adaptation is an important problem and often needed for real-world applications. A common approach in the domain adaptation literature is to learn a representation of the input that has the same distributions over the source and the target domain. We show that with a probabilistic representation network, the KL term can be estimated efficiently via minibatch samples.
arXiv Detail & Related papers (2021-06-14T22:24:23Z)
Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics. We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data. Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z)
Understanding Anomaly Detection with Deep Invertible Networks through Hierarchies of Distributions and Features [4.25227087152716]
Convolutional networks learn similar low-level feature distributions when trained on any natural image dataset. When the discriminative features between inliers and outliers are on a high-level, anomaly detection becomes particularly challenging. We propose two methods to remove the negative impact of model bias and domain prior on detecting high-level differences.
arXiv Detail & Related papers (2020-06-18T20:56:14Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. We empirically show that embedding propagation yields a smoother embedding manifold. We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
Regularizing activations in neural networks via distribution matching with the Wasserstein metric [9.442063850095808]
We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution. Per randomly projects activations onto one-dimensional space and computes the regularization loss in the projected space. We evaluate the proposed method on the image classification task and the word-level language modeling task.
arXiv Detail & Related papers (2020-02-13T06:42:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.