Precise High-Dimensional Asymptotics for Quantifying Heterogeneous
Transfers
- URL: http://arxiv.org/abs/2010.11750v3
- Date: Fri, 11 Aug 2023 01:45:43 GMT
- Title: Precise High-Dimensional Asymptotics for Quantifying Heterogeneous
Transfers
- Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher R\'e, Weijie J. Su
- Abstract summary: When is combining data from two tasks better than learning one task alone?
This paper uses random matrix theory to tackle this challenge in a linear regression setting with two tasks.
- Score: 34.40702005466919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The problem of learning one task with samples from another task has received
much interest recently. In this paper, we ask a fundamental question: when is
combining data from two tasks better than learning one task alone? Intuitively,
the transfer effect from one task to another task depends on dataset shifts
such as sample sizes and covariance matrices. However, quantifying such a
transfer effect is challenging since we need to compare the risks between joint
learning and single-task learning, and the comparative advantage of one over
the other depends on the exact kind of dataset shift between both tasks. This
paper uses random matrix theory to tackle this challenge in a linear regression
setting with two tasks. We give precise asymptotics about the excess risks of
some commonly used estimators in the high-dimensional regime, when the sample
sizes increase proportionally with the feature dimension at fixed ratios. The
precise asymptotics is provided as a function of the sample sizes and
covariate/model shifts, which can be used to study transfer effects: In a
random-effects model, we give conditions to determine positive and negative
transfers between learning two tasks versus single-task learning; the
conditions reveal intricate relations between dataset shifts and transfer
effects. Simulations justify the validity of the asymptotics in finite
dimensions. Our analysis examines several functions of two different sample
covariance matrices, revealing some estimates that generalize classical results
in the random matrix theory literature, which may be of independent interest.
Related papers
- Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level.
We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters.
We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z) - The Joint Effect of Task Similarity and Overparameterization on
Catastrophic Forgetting -- An Analytical Model [36.766748277141744]
In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks.
Previous works have analyzed separately how forgetting is affected by either task similarity or over parameterization.
This paper examines how task similarity and over parameterization jointly affect forgetting in an analyzable model.
arXiv Detail & Related papers (2024-01-23T10:16:44Z) - A Semiparametric Efficient Approach To Label Shift Estimation and
Quantification [0.0]
We present a new procedure called SELSE which estimates the shift in the response variable's distribution.
We prove that SELSE's normalized error has the smallest possible variance matrix compared to any other algorithm in that family.
arXiv Detail & Related papers (2022-11-07T07:49:29Z) - Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks.
In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks.
We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled.
We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples.
We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z) - A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms [6.193838300896449]
We study transfer learning from a Bayesian perspective, where a parametric statistical model is used.
Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning.
For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance.
Examples show that the derived bounds are accurate even for small sample sizes.
arXiv Detail & Related papers (2021-09-03T08:43:29Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z) - Double Double Descent: On Generalization Errors in Transfer Learning
between Linear Regression Tasks [30.075430694663293]
We study the transfer learning process between two linear regression problems.
We examine a parameter transfer mechanism whereby a subset of the parameters of the target task solution are constrained to the values learned for a related source task.
arXiv Detail & Related papers (2020-06-12T08:42:14Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.