Related papers: Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests

Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests

URL: http://arxiv.org/abs/2107.06929v1
Date: Wed, 14 Jul 2021 18:23:24 GMT
Title: Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests
Authors: Sean Kulinski, Saurabh Bagchi, David I. Inouye
Abstract summary: In military sensor networks, users will want to detect when one or more of the sensors has been compromised. We first define a formalization of this problem as multiple conditional distribution hypothesis tests. For both efficiency and flexibility, we propose a test statistic based on the density model score function.
Score: 12.468665026043382
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While previous distribution shift detection approaches can identify if a shift has occurred, these approaches cannot localize which specific features have caused a distribution shift -- a critical step in diagnosing or fixing any underlying issue. For example, in military sensor networks, users will want to detect when one or more of the sensors has been compromised, and critically, they will want to know which specific sensors might be compromised. Thus, we first define a formalization of this problem as multiple conditional distribution hypothesis tests and propose both non-parametric and parametric statistical tests. For both efficiency and flexibility, we then propose to use a test statistic based on the density model score function (i.e. gradient with respect to the input) -- which can easily compute test statistics for all dimensions in a single forward and backward pass. Any density model could be used for computing the necessary statistics including deep density models such as normalizing flows or autoregressive models. We additionally develop methods for identifying when and where a shift occurs in multivariate time-series data and show results for multiple scenarios using realistic attack models on both simulated and real world data.

Related papers

Unsupervised Anomaly and Change Detection with Multivariate Gaussianization [8.508880949780893]
Anomaly detection is a challenging problem given the high-dimensionality of the data. We propose an unsupervised method for detecting anomalies and changes in remote sensing images. We show the efficiency of the method in experiments involving both anomaly detection and change detection.
arXiv Detail & Related papers (2022-04-12T10:52:33Z)
Model-agnostic out-of-distribution detection using combined statistical tests [15.27980070479021]
We present simple methods for out-of-distribution detection using a trained generative model. We combine a classical parametric test (Rao's score test) with the recently introduced typicality test. Despite their simplicity and generality, these methods can be competitive with model-specific out-of-distribution detection algorithms.
arXiv Detail & Related papers (2022-03-02T13:32:09Z)
Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models. Common methods often consider the feature statistics as deterministic values measured from the learned features. We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates. We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters. We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Nonlinear Distribution Regression for Remote Sensing Applications [6.664736150040092]
In many remote sensing applications one wants to estimate variables or parameters of interest from observations. Standard algorithms such as neural networks, random forests or Gaussian processes are readily available to relate to the two. This paper introduces a nonlinear (kernel-based) method for distribution regression that solves the previous problems without making any assumption on the statistics of the grouped data.
arXiv Detail & Related papers (2020-12-07T22:04:43Z)
Density of States Estimation for Out-of-Distribution Detection [69.90130863160384]
DoSE is the density of states estimator. We demonstrate DoSE's state-of-the-art performance against other unsupervised OOD detectors.
arXiv Detail & Related papers (2020-06-16T16:06:25Z)
A Causal Direction Test for Heterogeneous Populations [10.653162005300608]
Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications. We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction. We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm.
arXiv Detail & Related papers (2020-06-08T18:59:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.