Model-agnostic Measure of Generalization Difficulty
- URL: http://arxiv.org/abs/2305.01034v2
- Date: Fri, 2 Jun 2023 23:03:46 GMT
- Title: Model-agnostic Measure of Generalization Difficulty
- Authors: Akhilan Boopathy, Kevin Liu, Jaedong Hwang, Shu Ge, Asaad
Mohammedsaleh, Ila Fiete
- Abstract summary: We propose the first model-agnostic measure of the inherent generalization difficulty of tasks.
Our measure quantifies the total information required to generalize well on a task minus the information provided by the data.
It scales exponentially with the intrinsic dimensionality of the space over which the model must generalize but only intuitively in resolution per dimension.
- Score: 7.183430740278161
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The measure of a machine learning algorithm is the difficulty of the tasks it
can perform, and sufficiently difficult tasks are critical drivers of strong
machine learning models. However, quantifying the generalization difficulty of
machine learning benchmarks has remained challenging. We propose what is to our
knowledge the first model-agnostic measure of the inherent generalization
difficulty of tasks. Our inductive bias complexity measure quantifies the total
information required to generalize well on a task minus the information
provided by the data. It does so by measuring the fractional volume occupied by
hypotheses that generalize on a task given that they fit the training data. It
scales exponentially with the intrinsic dimensionality of the space over which
the model must generalize but only polynomially in resolution per dimension,
showing that tasks which require generalizing over many dimensions are
drastically more difficult than tasks involving more detail in fewer
dimensions. Our measure can be applied to compute and compare supervised
learning, reinforcement learning and meta-learning generalization difficulties
against each other. We show that applied empirically, it formally quantifies
intuitively expected trends, e.g. that in terms of required inductive bias,
MNIST < CIFAR10 < Imagenet and fully observable Markov decision processes
(MDPs) < partially observable MDPs. Further, we show that classification of
complex images < few-shot meta-learning with simple images. Our measure
provides a quantitative metric to guide the construction of more complex tasks
requiring greater inductive bias, and thereby encourages the development of
more sophisticated architectures and learning algorithms with more powerful
generalization capabilities.
Related papers
- Weak-to-Strong Generalization Through the Data-Centric Lens [12.221894353699918]
We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density.
We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm.
arXiv Detail & Related papers (2024-12-05T05:29:19Z) - Quantifying Generalization Complexity for Large Language Models [31.721781613271066]
We introduce Scylla, a dynamic evaluation framework that quantitatively measures the generalization abilities of large language models.
Scylla disentangles generalization from memorization via assessing model performance on both in-distribution (ID) and out-of-distribution (OOD) data.
We benchmark 28LLMs including both open-sourced models such as LLaMA and Qwen families, and close-sourced models like Claude and GPT.
arXiv Detail & Related papers (2024-10-02T17:25:37Z) - Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data [76.90128359866462]
We introduce an extended concept of memorization, distributional memorization, which measures the correlation between the output probabilities and the pretraining data frequency.
We show that memorization plays a larger role in simpler, knowledge-intensive tasks, while generalization is the key for harder, reasoning-based tasks.
arXiv Detail & Related papers (2024-07-20T21:24:40Z) - A Notion of Complexity for Theory of Mind via Discrete World Models [2.487142846438629]
Theory of Mind (ToM) can be used to assess the capabilities of Large Language Models (LLMs) in complex scenarios where social reasoning is required.
This work proposes a framework inspired by cognitive load theory to measure the complexity of ToM tasks.
arXiv Detail & Related papers (2024-06-16T16:46:55Z) - A General Framework for Learning from Weak Supervision [93.89870459388185]
This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm.
Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources.
We also present an advanced algorithm that significantly simplifies the EM computational demands.
arXiv Detail & Related papers (2024-02-02T21:48:50Z) - The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning [80.1018596899899]
We argue that neural network models share this same preference, formalized using Kolmogorov complexity.
Our experiments show that pre-trained and even randomly language models prefer to generate low-complexity sequences.
These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.
arXiv Detail & Related papers (2023-04-11T17:22:22Z) - Neural Networks and the Chomsky Hierarchy [27.470857324448136]
We study whether insights from the theory of Chomsky can predict the limits of neural network generalization in practice.
We show negative results where even extensive amounts of data and training time never led to any non-trivial generalization.
Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, and only networks augmented with structured memory can successfully generalize on context-free and context-sensitive tasks.
arXiv Detail & Related papers (2022-07-05T15:06:11Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z) - When is Memorization of Irrelevant Training Data Necessary for
High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples.
Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.