The Curse of Performance Instability in Analysis Datasets: Consequences,
Source, and Suggestions
- URL: http://arxiv.org/abs/2004.13606v2
- Date: Mon, 16 Nov 2020 02:22:35 GMT
- Title: The Curse of Performance Instability in Analysis Datasets: Consequences,
Source, and Suggestions
- Authors: Xiang Zhou, Yixin Nie, Hao Tan, Mohit Bansal
- Abstract summary: We find that the performance of state-of-the-art models on Natural Language Inference (NLI) and Reading (RC) analysis/stress sets can be highly unstable.
This raises three questions: (1) How will the instability affect the reliability of the conclusions drawn based on these analysis sets?
We give both theoretical explanations and empirical evidence regarding the source of the instability.
- Score: 93.62888099134028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We find that the performance of state-of-the-art models on Natural Language
Inference (NLI) and Reading Comprehension (RC) analysis/stress sets can be
highly unstable. This raises three questions: (1) How will the instability
affect the reliability of the conclusions drawn based on these analysis sets?
(2) Where does this instability come from? (3) How should we handle this
instability and what are some potential solutions? For the first question, we
conduct a thorough empirical study over analysis sets and find that in addition
to the unstable final performance, the instability exists all along the
training curve. We also observe lower-than-expected correlations between the
analysis validation set and standard validation set, questioning the
effectiveness of the current model-selection routine. Next, to answer the
second question, we give both theoretical explanations and empirical evidence
regarding the source of the instability, demonstrating that the instability
mainly comes from high inter-example correlations within analysis sets.
Finally, for the third question, we discuss an initial attempt to mitigate the
instability and suggest guidelines for future work such as reporting the
decomposed variance for more interpretable results and fair comparison across
models. Our code is publicly available at:
https://github.com/owenzx/InstabilityAnalysis
Related papers
- Score matching through the roof: linear, nonlinear, and latent variables causal discovery [18.46845413928147]
Causal discovery from observational data holds great promise.
Existing methods rely on strong assumptions about the underlying causal structure.
We propose a flexible algorithm for causal discovery across linear, nonlinear, and latent variable models.
arXiv Detail & Related papers (2024-07-26T14:09:06Z) - Self-Compatibility: Evaluating Causal Discovery without Ground Truth [28.72650348646176]
We propose a novel method for falsifying the output of a causal discovery algorithm in the absence of ground truth.
Our key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables.
We prove that detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects.
arXiv Detail & Related papers (2023-07-18T18:59:42Z) - Identifying Weight-Variant Latent Causal Models [82.14087963690561]
We find that transitivity acts as a key role in impeding the identifiability of latent causal representations.
Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling.
We propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them.
arXiv Detail & Related papers (2022-08-30T11:12:59Z) - Positivity Validation Detection and Explainability via Zero Fraction
Multi-Hypothesis Testing and Asymmetrically Pruned Decision Trees [7.688686113950607]
Positivity is one of the three conditions for causal inference from observational data.
To democratize the ability to do causal inference by non-experts, it is required to design an algorithm to test positivity.
arXiv Detail & Related papers (2021-11-07T08:32:58Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - Disentangling Observed Causal Effects from Latent Confounders using
Method of Moments [67.27068846108047]
We provide guarantees on identifiability and learnability under mild assumptions.
We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions.
arXiv Detail & Related papers (2021-01-17T07:48:45Z) - Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process.
We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z) - Reachable Sets of Classifiers and Regression Models: (Non-)Robustness
Analysis and Robust Training [1.0878040851638]
We analyze and enhance robustness properties of both classifiers and regression models.
Specifically, we verify (non-)robustness, propose a robust training procedure, and show that our approach outperforms adversarial attacks.
Second, we provide techniques to distinguish between reliable and non-reliable predictions for unlabeled inputs, to quantify the influence of each feature on a prediction, and compute a feature ranking.
arXiv Detail & Related papers (2020-07-28T10:58:06Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.