Limits of Model Selection under Transfer Learning
- URL: http://arxiv.org/abs/2305.00152v4
- Date: Thu, 12 Oct 2023 14:05:03 GMT
- Title: Limits of Model Selection under Transfer Learning
- Authors: Steve Hanneke, Samory Kpotufe, Yasaman Mahdaviyeh
- Abstract summary: We study the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class.
In particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those with no distributional information, can be arbitrarily slower than oracle rates.
- Score: 18.53111473571927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Theoretical studies on transfer learning or domain adaptation have so far
focused on situations with a known hypothesis class or model; however in
practice, some amount of model selection is usually involved, often appearing
under the umbrella term of hyperparameter-tuning: for example, one may think of
the problem of tuning for the right neural network architecture towards a
target task, while leveraging data from a related source task.
Now, in addition to the usual tradeoffs on approximation vs estimation errors
involved in model selection, this problem brings in a new complexity term,
namely, the transfer distance between source and target distributions, which is
known to vary with the choice of hypothesis class.
We present a first study of this problem, focusing on classification; in
particular, the analysis reveals some remarkable phenomena: adaptive rates,
i.e., those achievable with no distributional information, can be arbitrarily
slower than oracle rates, i.e., when given knowledge on distances.
Related papers
- Tackling the Problem of Distributional Shifts: Correcting Misspecified, High-Dimensional Data-Driven Priors for Inverse Problems [39.58317527488534]
Data-driven population-level distributions are emerging as an appealing alternative to simple parametric priors in inverse problems.
It is difficult to acquire independent and identically distributed samples from the underlying data-generating process of interest to train these models.
We show that starting from a misspecified prior distribution, the updated distribution becomes progressively closer to the underlying population-level distribution.
arXiv Detail & Related papers (2024-07-24T22:39:27Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Demystifying Disagreement-on-the-Line in High Dimensions [34.103373453782744]
We develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression.
Experiments on CIFAR-10-C, Tiny ImageNet-C, and Camelyon17 are consistent with our theory and support the universality of the theoretical findings.
arXiv Detail & Related papers (2023-01-31T02:31:18Z) - PatchMix Augmentation to Identify Causal Features in Few-shot Learning [55.64873998196191]
Few-shot learning aims to transfer knowledge learned from base with sufficient categories labelled data to novel categories with scarce known information.
We propose a novel data augmentation strategy dubbed as PatchMix that can break this spurious dependency.
We show that such an augmentation mechanism, different from existing ones, is able to identify the causal features.
arXiv Detail & Related papers (2022-11-29T08:41:29Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - An Information-theoretic Approach to Distribution Shifts [9.475039534437332]
Safely deploying machine learning models to the real world is often a challenging process.
Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere.
neural networks that are fit to a subset of the population might carry some selection bias into their decision process.
arXiv Detail & Related papers (2021-06-07T16:44:21Z) - Fundamental Limits and Tradeoffs in Invariant Representation Learning [99.2368462915979]
Many machine learning applications involve learning representations that achieve two competing goals.
Minimax game-theoretic formulation represents a fundamental tradeoff between accuracy and invariance.
We provide an information-theoretic analysis of this general and important problem under both classification and regression settings.
arXiv Detail & Related papers (2020-12-19T15:24:04Z) - Moment-Based Domain Adaptation: Learning Bounds and Algorithms [1.827510863075184]
This thesis contributes to the mathematical foundation of domain adaptation as emerging field in machine learning.
In contrast to classical statistical learning, the framework of domain adaptation takes into account deviations between probability distributions in the training and application setting.
arXiv Detail & Related papers (2020-04-22T14:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.