Effective dimension of machine learning models
- URL: http://arxiv.org/abs/2112.04807v1
- Date: Thu, 9 Dec 2021 10:00:18 GMT
- Title: Effective dimension of machine learning models
- Authors: Amira Abbas, David Sutter, Alessio Figalli, Stefan Woerner
- Abstract summary: Making statements about the performance of trained models on tasks involving new data is one of the primary goals of machine learning.
Various capacity measures try to capture this ability, but usually fall short in explaining important characteristics of models that we observe in practice.
We propose the local effective dimension as a capacity measure which seems to correlate well with generalization error on standard data sets.
- Score: 4.721845865189576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Making statements about the performance of trained models on tasks involving
new data is one of the primary goals of machine learning, i.e., to understand
the generalization power of a model. Various capacity measures try to capture
this ability, but usually fall short in explaining important characteristics of
models that we observe in practice. In this study, we propose the local
effective dimension as a capacity measure which seems to correlate well with
generalization error on standard data sets. Importantly, we prove that the
local effective dimension bounds the generalization error and discuss the
aptness of this capacity measure for machine learning models.
Related papers
- Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Machine Learning vs Deep Learning: The Generalization Problem [0.0]
This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation.
We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain.
Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
arXiv Detail & Related papers (2024-03-03T21:42:55Z) - Generalizable Error Modeling for Human Data Annotation: Evidence From an Industry-Scale Search Data Annotation Program [0.0]
This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks.
We show that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications.
We demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors.
arXiv Detail & Related papers (2023-10-08T21:21:19Z) - Minimal Value-Equivalent Partial Models for Scalable and Robust Planning
in Lifelong Reinforcement Learning [56.50123642237106]
Common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment.
We argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios.
We propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-minimal partial models"
arXiv Detail & Related papers (2023-01-24T16:40:01Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - One Objective for All Models -- Self-supervised Learning for Topic
Models [11.67381769733002]
We highlight a key advantage of self-supervised learning -- when applied to data generated by topic models, self-supervised learning can be oblivious to the specific model.
In particular, we prove that commonly used self-supervised objectives based on reconstruction or contrastive samples can both recover useful posterior information for general topic models.
arXiv Detail & Related papers (2022-02-02T06:20:59Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Modeling Generalization in Machine Learning: A Methodological and
Computational Study [0.8057006406834467]
We use the concept of the convex hull of the training data in assessing machine learning generalization.
We observe unexpectedly weak associations between the generalization ability of machine learning models and all metrics related to dimensionality.
arXiv Detail & Related papers (2020-06-28T19:06:16Z) - The Conditional Entropy Bottleneck [8.797368310561058]
We characterize failures of robust generalization as failures of accuracy or related metrics on a held-out set.
We propose the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model.
In order to train models that perform well with respect to the MNI criterion, we present a new objective function, the Conditional Entropy Bottleneck (CEB)
We experimentally test our hypothesis by comparing the performance of CEB models with deterministic models and Variational Information Bottleneck (VIB) models on a variety of different datasets.
arXiv Detail & Related papers (2020-02-13T07:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.