Dropout as a Regularizer of Interaction Effects
- URL: http://arxiv.org/abs/2007.00823v2
- Date: Sun, 17 Oct 2021 15:05:31 GMT
- Title: Dropout as a Regularizer of Interaction Effects
- Authors: Benjamin Lengerich, Eric P. Xing, Rich Caruana
- Abstract summary: Dropout is a regularizer against higher-order interactions.
We prove this perspective analytically and empirically.
We also find that it is difficult to obtain the same selective pressure against high-order interactions.
- Score: 76.84531978621143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We examine Dropout through the perspective of interactions. This view
provides a symmetry to explain Dropout: given $N$ variables, there are ${N
\choose k}$ possible sets of $k$ variables to form an interaction (i.e.
$\mathcal{O}(N^k)$); conversely, the probability an interaction of $k$
variables survives Dropout at rate $p$ is $(1-p)^k$ (decaying with $k$). These
rates effectively cancel, and so Dropout regularizes against higher-order
interactions. We prove this perspective analytically and empirically. This
perspective of Dropout as a regularizer against interaction effects has several
practical implications: (1) higher Dropout rates should be used when we need
stronger regularization against spurious high-order interactions, (2) caution
should be exercised when interpreting Dropout-based explanations and
uncertainty measures, and (3) networks trained with Input Dropout are biased
estimators. We also compare Dropout to other regularizers and find that it is
difficult to obtain the same selective pressure against high-order
interactions.
Related papers
- Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization [60.176008034221404]
Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences.
Prior work has observed that the likelihood of preferred responses often decreases during training.
We demonstrate that likelihood displacement can be catastrophic, shifting probability mass from preferred responses to responses with an opposite meaning.
arXiv Detail & Related papers (2024-10-11T14:22:44Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Balancing central and marginal rejection when combining independent
significance tests [0.0]
A common approach to evaluating the significance of a collection of $p$-values combines them with a pooling function.
A series of alternative hypotheses are introduced that communicate the strength and prevalence of non-null evidence in the $p$-values.
It is proven that central rejection is always greater than or equal to marginal rejection, motivating a quotient to measure the balance between the two.
arXiv Detail & Related papers (2023-10-25T12:45:49Z) - Predicting Rare Events by Shrinking Towards Proportional Odds [1.599072005190786]
We show that the more abundant data in earlier steps may be leveraged to improve estimation of probabilities of rare events.
We present PRESTO, a relaxation of the proportional odds model for ordinal regression.
We prove that PRESTO consistently estimates the decision boundary weights under a sparsity assumption.
arXiv Detail & Related papers (2023-05-30T02:50:08Z) - WR-ONE2SET: Towards Well-Calibrated Keyphrase Generation [57.11538133231843]
Keyphrase generation aims to automatically generate short phrases summarizing an input document.
The recently emerged ONE2SET paradigm generates keyphrases as a set and has achieved competitive performance.
We propose WR-ONE2SET which extends ONE2SET with an adaptive instance-level cost Weighting strategy and a target Re-assignment mechanism.
arXiv Detail & Related papers (2022-11-13T09:56:24Z) - The Curse of Passive Data Collection in Batch Reinforcement Learning [82.6026077420886]
In high stake applications, active experimentation may be considered too risky and thus data are often collected passively.
While in simple cases, such as in bandits, passive and active data collection are similarly effective, the price of passive sampling can be much higher when collecting data from a system with controlled states.
arXiv Detail & Related papers (2021-06-18T07:54:23Z) - $PredDiff$: Explanations and Interactions from Conditional Expectations [0.3655021726150368]
$PredDiff$ is a model-agnostic, local attribution method rooted in probability theory.
In this work, we clarify properties of $PredDiff$ and put forward several extensions of the original formalism.
arXiv Detail & Related papers (2021-02-26T14:46:47Z) - Towards Defending Multiple $\ell_p$-norm Bounded Adversarial
Perturbations via Gated Batch Normalization [120.99395850108422]
Existing adversarial defenses typically improve model robustness against individual specific perturbations.
Some recent methods improve model robustness against adversarial attacks in multiple $ell_p$ balls, but their performance against each perturbation type is still far from satisfactory.
We propose Gated Batch Normalization (GBN) to adversarially train a perturbation-invariant predictor for defending multiple $ell_p bounded adversarial perturbations.
arXiv Detail & Related papers (2020-12-03T02:26:01Z) - The Implicit and Explicit Regularization Effects of Dropout [43.431343291010734]
Dropout is a widely-used regularization technique, often required to obtain state-of-the-art for a number of architectures.
This work demonstrates that dropout introduces two distinct but entangled regularization effects.
arXiv Detail & Related papers (2020-02-28T18:31:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.