Grounding Representation Similarity with Statistical Testing
- URL: http://arxiv.org/abs/2108.01661v1
- Date: Tue, 3 Aug 2021 17:58:16 GMT
- Title: Grounding Representation Similarity with Statistical Testing
- Authors: Frances Ding, Jean-Stanislas Denain, Jacob Steinhardt
- Abstract summary: We argue that measures should have sensitivity to changes that affect functional behavior, and specificity against changes that do not.
We quantify this through a variety of functional behaviors including probing accuracy and robustness to distribution shift.
We find that current metrics exhibit different weaknesses, note that a classical baseline performs surprisingly well, and highlight settings where all metrics appear to fail.
- Score: 8.296135566684065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To understand neural network behavior, recent works quantitatively compare
different networks' learned representations using canonical correlation
analysis (CCA), centered kernel alignment (CKA), and other dissimilarity
measures. Unfortunately, these widely used measures often disagree on
fundamental observations, such as whether deep networks differing only in
random initialization learn similar representations. These disagreements raise
the question: which, if any, of these dissimilarity measures should we believe?
We provide a framework to ground this question through a concrete test:
measures should have sensitivity to changes that affect functional behavior,
and specificity against changes that do not. We quantify this through a variety
of functional behaviors including probing accuracy and robustness to
distribution shift, and examine changes such as varying random initialization
and deleting principal components. We find that current metrics exhibit
different weaknesses, note that a classical baseline performs surprisingly
well, and highlight settings where all metrics appear to fail, thus providing a
challenge set for further improvement.
Related papers
- The diameter of a stochastic matrix: A new measure for sensitivity analysis in Bayesian networks [1.2699007098398807]
We argue that robustness methods based on the familiar total variation distance provide simple and more valuable bounds on robustness to misspecification.
We introduce a novel measure of dependence in conditional probability tables called the diameter to derive such bounds.
arXiv Detail & Related papers (2024-07-05T17:22:12Z) - GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples
using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks.
GIT combines the usage of gradient information and invariance transformations.
Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z) - Align, Perturb and Decouple: Toward Better Leverage of Difference
Information for RSI Change Detection [24.249552791014644]
Change detection is a widely adopted technique in remote sense imagery (RSI) analysis.
We propose a series of operations to fully exploit the difference information: Alignment, Perturbation and Decoupling.
arXiv Detail & Related papers (2023-05-30T03:39:53Z) - Variational Inference: Posterior Threshold Improves Network Clustering Accuracy in Sparse Regimes [2.5782420501870296]
This paper proposes a simple way to improve the variational inference method by hard thresholding the posterior of the community assignment after each iteration.
We show that the proposed method converges and can accurately recover the true community labels, even when the average node degree of the network is bounded.
arXiv Detail & Related papers (2023-01-12T00:24:54Z) - Reliability of CKA as a Similarity Measure in Deep Learning [17.555458413538233]
We present analysis that characterizes CKA sensitivity to a large class of simple transformations.
We investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results.
Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models.
arXiv Detail & Related papers (2022-10-28T14:32:52Z) - E-detectors: a nonparametric framework for sequential change detection [86.15115654324488]
We develop a fundamentally new and general framework for sequential change detection.
Our procedures come with clean, nonasymptotic bounds on the average run length.
We show how to design their mixtures in order to achieve both statistical and computational efficiency.
arXiv Detail & Related papers (2022-03-07T17:25:02Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Tangent Space Sensitivity and Distribution of Linear Regions in ReLU
Networks [0.0]
We consider adversarial stability in the tangent space and suggest tangent sensitivity in order to characterize stability.
We derive several easily computable bounds and empirical measures for feed-forward fully connected ReLU networks.
Our experiments suggest that even simple bounds and measures are associated with the empirical generalization gap.
arXiv Detail & Related papers (2020-06-11T20:02:51Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z) - Optimal Change-Point Detection with Training Sequences in the Large and
Moderate Deviations Regimes [72.68201611113673]
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective.
We assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available.
arXiv Detail & Related papers (2020-03-13T23:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.