Deep Learning Through the Lens of Example Difficulty
- URL: http://arxiv.org/abs/2106.09647v2
- Date: Fri, 18 Jun 2021 16:36:37 GMT
- Title: Deep Learning Through the Lens of Example Difficulty
- Authors: Robert J. N. Baldock, Hartmut Maennel and Behnam Neyshabur
- Abstract summary: We introduce a measure of the computational difficulty of making a prediction for a given input: the (effective) prediction depth.
Our investigation reveals surprising yet simple relationships between the prediction depth of a given input and the model's uncertainty, confidence, accuracy and speed of learning for that data point.
- Score: 21.522182447513632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing work on understanding deep learning often employs measures that
compress all data-dependent information into a few numbers. In this work, we
adopt a perspective based on the role of individual examples. We introduce a
measure of the computational difficulty of making a prediction for a given
input: the (effective) prediction depth. Our extensive investigation reveals
surprising yet simple relationships between the prediction depth of a given
input and the model's uncertainty, confidence, accuracy and speed of learning
for that data point. We further categorize difficult examples into three
interpretable groups, demonstrate how these groups are processed differently
inside deep models and showcase how this understanding allows us to improve
prediction accuracy. Insights from our study lead to a coherent view of a
number of separately reported phenomena in the literature: early layers
generalize while later layers memorize; early layers converge faster and
networks learn easy data and simple functions first.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Predicting and analyzing memorization within fine-tuned Large Language Models [0.0]
Large Language Models memorize a significant proportion of their training data, posing a serious threat when disclosed at inference time.
We propose a new approach based on sliced mutual information to detect memorized samples a priori.
We obtain strong empirical results, paving the way for systematic inspection and protection of these vulnerable samples before memorization happens.
arXiv Detail & Related papers (2024-09-27T15:53:55Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - When is Memorization of Irrelevant Training Data Necessary for
High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples.
Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z) - What Neural Networks Memorize and Why: Discovering the Long Tail via
Influence Estimation [37.5845376458136]
Deep learning algorithms are well-known to have a propensity for fitting the training data very well.
Such fitting requires memorization of training data labels.
We propose a theoretical explanation for this phenomenon based on a combination of two insights.
arXiv Detail & Related papers (2020-08-09T10:12:28Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - Post-Estimation Smoothing: A Simple Baseline for Learning with Side
Information [102.18616819054368]
We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction.
Because the smoothing step is separate from the original predictor, it applies to a broad class of machine learning tasks.
Our experiments on large scale spatial and temporal datasets highlight the speed and accuracy of post-estimation smoothing in practice.
arXiv Detail & Related papers (2020-03-12T18:04:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.