Related papers: Model Successor Functions

Related papers

The Role of Sparsity for Length Generalization in Transformers [58.65997625433689]
We propose a new theoretical framework to study length generalization for the next-token prediction task. We show that length generalization occurs as long as each predicted token depends on a small (fixed) number of previous tokens. We introduce Predictive Position Coupling, which trains the transformer to predict the position IDs used in a positional coupling approach.
arXiv Detail & Related papers (2025-02-24T03:01:03Z)
Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective [43.510860711231544]
State-of-the-art self-supervised methods tend to enhance either generalizability or discriminability but not both simultaneously.<n>We propose a novel self-supervised learning method that leverages advancements in reinforcement learning to jointly benefit from the general guidance of EGT.
arXiv Detail & Related papers (2024-11-30T17:20:23Z)
Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity [84.12126298229866]
We show that zero-shot generalization during instruction tuning happens very early. We also show that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization. For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level.
arXiv Detail & Related papers (2024-06-17T16:40:21Z)
On the Generalization Ability of Unsupervised Pretraining [53.06175754026037]
Recent advances in unsupervised learning have shown that unsupervised pre-training, followed by fine-tuning, can improve model generalization. This paper introduces a novel theoretical framework that illuminates the critical factor influencing the transferability of knowledge acquired during unsupervised pre-training to the subsequent fine-tuning phase. Our results contribute to a better understanding of unsupervised pre-training and fine-tuning paradigm, and can shed light on the design of more effective pre-training algorithms.
arXiv Detail & Related papers (2024-03-11T16:23:42Z)
Class-wise Generalization Error: an Information-Theoretic Analysis [22.877440350595222]
We study the class-generalization error, which quantifies the generalization performance of each individual class. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior.
arXiv Detail & Related papers (2024-01-05T17:05:14Z)
Inverse Decision Modeling: Learning Interpretable Representations of Behavior [72.80902932543474]
We develop an expressive, unifying perspective on inverse decision modeling. We use this to formalize the inverse problem (as a descriptive model) We illustrate how this structure enables learning (interpretable) representations of (bounded) rationality.
arXiv Detail & Related papers (2023-10-28T05:05:01Z)
The Ideal Continual Learner: An Agent That Never Forgets [11.172382217477129]
The goal of continual learning is to find a model that solves multiple learning tasks which are presented sequentially to the learner. A key challenge in this setting is that the learner may forget how to solve a previous task when learning a new task, a phenomenon known as catastrophic forgetting. This paper proposes a new continual learning framework called Ideal Continual Learner (ICL) which is guaranteed to avoid catastrophic forgetting by construction.
arXiv Detail & Related papers (2023-04-29T18:06:14Z)
Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks. Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
Towards Principled Disentanglement for Domain Generalization [90.9891372499545]
A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data. We first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG) Based on the transformation, we propose a primal-dual algorithm for joint representation disentanglement and domain generalization.
arXiv Detail & Related papers (2021-11-27T07:36:32Z)
Explaining generalization in deep learning: progress and fundamental limits [8.299945169799795]
In the first part of the thesis, we will empirically study how training deep networks via gradient descent implicitly controls the networks' capacity. We will then derive em data-dependent em uniform-convergence-based generalization bounds with improved dependencies on the parameter count. In the last part of the thesis, we will introduce an em empirical technique to estimate generalization using unlabeled data.
arXiv Detail & Related papers (2021-10-17T21:17:30Z)
Distinguishing rule- and exemplar-based generalization in learning systems [10.396761067379195]
We investigate two distinct inductive biases: feature-level bias and exemplar-vs-rule bias. We find that most standard neural network models have a propensity towards exemplar-based extrapolation. We discuss the implications of these findings for research on data augmentation, fairness, and systematic generalization.
arXiv Detail & Related papers (2021-10-08T18:37:59Z)
Target Languages (vs. Inductive Biases) for Learning to Act and Plan [13.820550902006078]
I articulate a different learning approach where representations do not emerge from biases in a neural architecture but are learned over a given target language with a known semantics. The goals of the paper and talk are to make these ideas explicit, to place them in a broader context where the design of the target language is crucial, and to illustrate them in the context of learning to act and plan.
arXiv Detail & Related papers (2021-09-15T10:24:13Z)
Towards Out-Of-Distribution Generalization: A Survey [46.329995334444156]
Out-of-Distribution generalization is an emerging topic of machine learning research. This paper represents the first comprehensive, systematic review of OOD generalization.
arXiv Detail & Related papers (2021-08-31T05:28:42Z)
A Self-Supervised Framework for Function Learning and Extrapolation [1.9374999427973014]
We present a framework for how a learner may acquire representations that support generalization. We show the resulting representations outperform those from other models for unsupervised time series learning.
arXiv Detail & Related papers (2021-06-14T12:41:03Z)
Recent advances in deep learning theory [104.01582662336256]
This paper reviews and organizes the recent advances in deep learning theory. The literature is categorized in six groups: (1) complexity and capacity-based approaches for analysing the generalizability of deep learning; (2) differential equations and their dynamic systems for modelling gradient descent and its variants; (3) the geometrical structures of the loss landscape that drives the trajectories of the dynamic systems; and (5) theoretical foundations of several special structures in network architectures.
arXiv Detail & Related papers (2020-12-20T14:16:41Z)
In Search of Robust Measures of Generalization [79.75709926309703]
We develop bounds on generalization error, optimization error, and excess risk. When evaluated empirically, most of these bounds are numerically vacuous. We argue that generalization measures should instead be evaluated within the framework of distributional robustness.
arXiv Detail & Related papers (2020-10-22T17:54:25Z)
Developing Constrained Neural Units Over Time [81.19349325749037]
This paper focuses on an alternative way of defining Neural Networks, that is different from the majority of existing approaches. The structure of the neural architecture is defined by means of a special class of constraints that are extended also to the interaction with data. The proposed theory is cast into the time domain, in which data are presented to the network in an ordered manner.
arXiv Detail & Related papers (2020-09-01T09:07:25Z)
Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.