Model Successor Functions
- URL: http://arxiv.org/abs/2502.00197v1
- Date: Fri, 31 Jan 2025 22:27:09 GMT
- Title: Model Successor Functions
- Authors: Yingshan Chang, Yonatan Bisk,
- Abstract summary: In inductive generalization, it is often assumed that the training data lie in the easier side, while the testing data lie in the harder side.
This work provides a formalization that centers on the concept of model successors.
Then we outline directions to adapt well-established techniques towards the learning of model successors.
- Score: 31.25792515137003
- License:
- Abstract: The notion of generalization has moved away from the classical one defined in statistical learning theory towards an emphasis on out-of-domain generalization (OODG). Recently, there is a growing focus on inductive generalization, where a progression of difficulty implicitly governs the direction of domain shifts. In inductive generalization, it is often assumed that the training data lie in the easier side, while the testing data lie in the harder side. The challenge is that training data are always finite, but a learner is expected to infer an inductive principle that could be applied in an unbounded manner. This emerging regime has appeared in the literature under different names, such as length/logical/algorithmic extrapolation, but a formal definition is lacking. This work provides such a formalization that centers on the concept of model successors. Then we outline directions to adapt well-established techniques towards the learning of model successors. This work calls for restructuring of the research discussion around inductive generalization from fragmented task-centric communities to a more unified effort, focused on universal properties of learning and computation.
Related papers
- Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early.
We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization.
For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z) - On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization.
This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase.
Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z) - Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class.
We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data.
We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG)
Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z) - Explaining generalization in deep learning: progress and fundamental
limits [8.299945169799795]
In the first part of the thesis, we will empirically study how training deep networks via gradient descent implicitly controls the networks' capacity.
We will then derive em data-dependent em uniform-convergence-based generalization bounds with improved dependencies on the parameter count.
In the last part of the thesis, we will introduce an em empirical technique to estimate generalization using unlabeled data.
arXiv Detail & Related papers (2021-10-17T21:17:30Z) - Distinguishing rule- and exemplar-based generalization in learning
systems [10.396761067379195]
We investigate two distinct inductive biases: feature-level bias and exemplar-vs-rule bias.
We find that most standard neural network models have a propensity towards exemplar-based extrapolation.
We discuss the implications of these findings for research on data augmentation, fairness, and systematic generalization.
arXiv Detail & Related papers (2021-10-08T18:37:59Z) - Towards Out-Of-Distribution Generalization: A Survey [46.329995334444156]
Out-of-Distribution generalization is an emerging topic of machine learning research.
This paper represents the first comprehensive, systematic review of OOD generalization.
arXiv Detail & Related papers (2021-08-31T05:28:42Z) - In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk.
When evaluated empirically, most of these bounds are numerically vacuous.
We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.