Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning
- URL: http://arxiv.org/abs/2507.04194v1
- Date: Sun, 06 Jul 2025 00:03:34 GMT
- Title: Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning
- Authors: Yuyang Deng, Samory Kpotufe,
- Abstract summary: We consider the problem of designing an SGD procedure that alternates sampling between source and target data.<n>A main algorithmic difficulty is in understanding how to design such an adaptive sub-sampling mechanism at each SGD step.<n>We show that, such a mixed-sample SGD procedure is feasible for general prediction tasks with convex losses.
- Score: 6.614418593039343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Theoretical works on supervised transfer learning (STL) -- where the learner has access to labeled samples from both source and target distributions -- have for the most part focused on statistical aspects of the problem, while efficient optimization has received less attention. We consider the problem of designing an SGD procedure for STL that alternates sampling between source and target data, while maintaining statistical transfer guarantees without prior knowledge of the quality of the source data. A main algorithmic difficulty is in understanding how to design such an adaptive sub-sampling mechanism at each SGD step, to automatically gain from the source when it is informative, or bias towards the target and avoid negative transfer when the source is less informative. We show that, such a mixed-sample SGD procedure is feasible for general prediction tasks with convex losses, rooted in tracking an abstract sequence of constrained convex programs that serve to maintain the desired transfer guarantees. We instantiate these results in the concrete setting of linear regression with square loss, and show that the procedure converges, with $1/\sqrt{T}$ rate, to a solution whose statistical performance on the target is adaptive to the a priori unknown quality of the source. Experiments with synthetic and real datasets support the theory.
Related papers
- Formal Bayesian Transfer Learning via the Total Risk Prior [1.8570591025615457]
We show how a particular instantiation of our prior leads to a Bayesian Lasso in a transformed coordinate system.<n>We also demonstrate that recently proposed minimax-frequentist transfer learning techniques may be viewed as an approximate Maximum a Posteriori approach to our model.
arXiv Detail & Related papers (2025-07-31T17:55:16Z) - Statistical Inference for Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss [9.054486124506521]
We study multi-source unsupervised domain adaptation, where labeled data are drawn from multiple source domains and only unlabeled data from a target domain.<n>We propose a novel Conditional Conditional Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from the sources.<n>We establish fast statistical convergence rates for the estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges.
arXiv Detail & Related papers (2025-07-14T04:21:23Z) - Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck [28.661084093544684]
We propose a novel approach based on the information bottleneck (IB) principle and invariant risk minimization (IRM) framework.
The proposed method aims to extract compact and informative features that possess high capability for effective domain-shift generalization.
We show that the proposed scheme outperforms state-of-the-art approaches and achieves a better rate-distortion tradeoff.
arXiv Detail & Related papers (2024-05-15T17:07:55Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Robust Transfer Learning with Unreliable Source Data [11.813197709246289]
We introduce a novel quantity called the ''ambiguity level'' that measures the discrepancy between the target and source regression functions.<n>We propose a simple transfer learning procedure, and establish a general theorem that shows how this new quantity is related to the transferability of learning.
arXiv Detail & Related papers (2023-10-06T21:50:21Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Hypothesis Transfer Learning with Surrogate Classification Losses:
Generalization Bounds through Algorithmic Stability [3.908842679355255]
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target.
This paper studies the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis.
arXiv Detail & Related papers (2023-05-31T09:38:21Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - The Power and Limitation of Pretraining-Finetuning for Linear Regression
under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data.
For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Transfer Learning under High-dimensional Generalized Linear Models [7.675822266933702]
We study the transfer learning problem under high-dimensional generalized linear models.
We propose an oracle algorithm and derive its $ell$-estimation error bounds.
When we don't know which sources to transfer, an algorithm-free transferable source detection approach is introduced.
arXiv Detail & Related papers (2021-05-29T15:39:43Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.