Related papers: Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

URL: http://arxiv.org/abs/2010.11750v4
Date: Mon, 28 Apr 2025 08:57:58 GMT
Title: Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers
Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su,
Abstract summary: The problem of learning one task with samples from another task is central to transfer learning (TL)<n>In this paper, we examine a fundamental question: When does combining the data samples from a source task and a target task perform better than single-task learning with the target task alone?
Score: 66.66228496844191
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The problem of learning one task with samples from another task is central to transfer learning (TL). In this paper, we examine a fundamental question: When does combining the data samples from a source task and a target task perform better than single-task learning with the target task alone? This question is motivated by an intriguing phenomenon known as negative transfer often observed in the TL literature. Precise quantification of TL effects -- even within simple statistical models -- has remained elusive in the statistical learning literature. A critical challenge is that to compare TL to single-task learning, we would need to compare the risks between two different estimators in a very precise way. In particular, the comparative advantage of one estimator over another would depend on the specific distribution shifts between the two tasks. This paper applies recent developments in the random matrix theory literature to tackle this challenge in a high-dimensional linear regression setting with two tasks. We provide precise high-dimensional asymptotics for the bias and variance of hard parameter sharing (HPS) estimators in the proportional limit, when the sample sizes of both tasks increase proportionally with dimension at fixed ratios. The precise asymptotics are expressed as a function of the sample sizes of both tasks, the covariate shift between their feature population covariate matrices, and the model shift. We provide illustrative examples of our results in a random-effects model to determine positive and negative transfers. For example, we can identify a phase transition in the high-dimensional linear regression setting from positive transfer to negative transfer under a model shift between the source and target tasks. The finding regarding phase transition can be extended to a multiple-task learning setting where the feature covariates are shared across all tasks.

Related papers

Knowledge Composition using Task Vectors with Learned Anisotropic Scaling [51.4661186662329]
We introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsicity of pre-trained models, with only a few coefficients being the learnable parameters. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives.
arXiv Detail & Related papers (2024-07-03T07:54:08Z)
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis [53.38518232934096]
Multi-task learning (MTL) is a powerful machine learning paradigm designed to leverage shared knowledge across tasks to improve generalization and performance. We propose an MTL approach at the intersection between task clustering and feature transformation based on a two-phase iterative aggregation of targets and features. In both phases, a key aspect is to preserve the interpretability of the reduced targets and features through the aggregation with the mean, which is motivated by applications to Earth science.
arXiv Detail & Related papers (2024-06-12T08:30:16Z)
The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model [36.766748277141744]
In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or over parameterization. This paper examines how task similarity and over parameterization jointly affect forgetting in an analyzable model.
arXiv Detail & Related papers (2024-01-23T10:16:44Z)
FairBranch: Mitigating Bias Transfer in Fair Multi-task Learning [15.319254128769973]
Multi-Task Learning (MTL) suffers when unrelated tasks negatively impact each other by updating shared parameters with conflicting gradients. This is known as negative transfer and leads to a drop in MTL accuracy compared to single-task learning (STL)
arXiv Detail & Related papers (2023-10-20T18:07:15Z)
Generalization Performance of Transfer Learning: Overparameterized and Underparameterized Regimes [61.22448274621503]
In real-world applications, tasks often exhibit partial similarity, where certain aspects are similar while others are different or irrelevant. Our study explores various types of transfer learning, encompassing two options for parameter transfer. We provide practical guidelines for determining the number of features in the common and task-specific parts for improved generalization performance.
arXiv Detail & Related papers (2023-06-08T03:08:40Z)
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models [80.23791222509644]
Inconsistent AI models are considered brittle and untrustworthy by human users. We find that state-of-the-art vision-language models suffer from a surprisingly high degree of inconsistent behavior across tasks. We propose a rank correlation-based auxiliary training objective, computed over large automatically created cross-task contrast sets.
arXiv Detail & Related papers (2023-03-28T16:57:12Z)
Transferability Estimation Based On Principal Gradient Expectation [68.97403769157117]
Cross-task transferability is compatible with transferred results while keeping self-consistency. Existing transferability metrics are estimated on the particular model by conversing source and target tasks. We propose Principal Gradient Expectation (PGE), a simple yet effective method for assessing transferability across tasks.
arXiv Detail & Related papers (2022-11-29T15:33:02Z)
A Semiparametric Efficient Approach To Label Shift Estimation and Quantification [0.0]
We present a new procedure called SELSE which estimates the shift in the response variable's distribution. We prove that SELSE's normalized error has the smallest possible variance matrix compared to any other algorithm in that family.
arXiv Detail & Related papers (2022-11-07T07:49:29Z)
Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks. In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks. We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z)
Noise Covariance Estimation in Multi-Task High-dimensional Linear Models [8.807375890824977]
This paper studies the multi-task high-dimensional linear regression models where the noise among different tasks is correlated. Treating the regression coefficients as a nuisance parameter, we leverage the multi-task elastic-net and multi-task lasso estimators to estimate the nuisance. Under suitable conditions, the proposed estimator attains the same rate of convergence as the "oracle" estimator.
arXiv Detail & Related papers (2022-06-15T02:37:37Z)
Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime [9.184987303791292]
We show that in permuted MNIST image classification tasks, the performance of multilayer perceptrons trained by vanilla gradient descent can be improved. We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem. We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small.
arXiv Detail & Related papers (2022-06-01T18:04:33Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
A Lagrangian Duality Approach to Active Learning [119.36233726867992]
We consider the batch active learning problem, where only a subset of the training data is labeled. We formulate the learning problem using constrained optimization, where each constraint bounds the performance of the model on labeled samples. We show, via numerical experiments, that our proposed approach performs similarly to or better than state-of-the-art active learning methods.
arXiv Detail & Related papers (2022-02-08T19:18:49Z)
A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms [6.193838300896449]
We study transfer learning from a Bayesian perspective, where a parametric statistical model is used. Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning. For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance. Examples show that the derived bounds are accurate even for small sample sizes.
arXiv Detail & Related papers (2021-09-03T08:43:29Z)
Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning [92.05556163518999]
MARL exacerbates matters by imposing various constraints on communication and observability. For value-based methods, it poses challenges in accurately representing the optimal value function. For policy gradient methods, it makes training the critic difficult and exacerbates the problem of the lagging critic. We show that from a learning theory perspective, both problems can be addressed by accurately representing the associated action-value function.
arXiv Detail & Related papers (2021-05-31T23:08:05Z)
Contrastive learning of strong-mixing continuous-time stochastic processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z)
An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning [44.320945743871285]
We present novel information-theoretic bounds on the average absolute value of the meta-generalization gap. Our bounds explicitly capture the impact of task relatedness, the number of tasks, and the number of data samples per task on the meta-generalization gap.
arXiv Detail & Related papers (2021-01-21T01:38:16Z)
Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks. In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other. This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks [30.075430694663293]
We study the transfer learning process between two linear regression problems. We examine a parameter transfer mechanism whereby a subset of the parameters of the target task solution are constrained to the values learned for a related source task.
arXiv Detail & Related papers (2020-06-12T08:42:14Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.