Minimax optimal transfer learning for high-dimensional additive regression
- URL: http://arxiv.org/abs/2509.06308v2
- Date: Tue, 16 Sep 2025 08:59:48 GMT
- Title: Minimax optimal transfer learning for high-dimensional additive regression
- Authors: Seung Hyun Moon,
- Abstract summary: We first introduce a target-only estimation procedure based on the smooth backfitting estimator with local linear smoothing.<n>We then develop a novel two-stage estimation method within a transfer learning framework, and provide theoretical guarantees at both the population and empirical levels.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper studies high-dimensional additive regression under the transfer learning framework, where one observes samples from a target population together with auxiliary samples from different but potentially related regression models. We first introduce a target-only estimation procedure based on the smooth backfitting estimator with local linear smoothing. In contrast to previous work, we establish general error bounds under sub-Weibull($\alpha$) noise, thereby accommodating heavy-tailed error distributions. In the sub-exponential case ($\alpha=1$), we show that the estimator attains the minimax lower bound under regularity conditions, which requires a substantial departure from existing proof strategies. We then develop a novel two-stage estimation method within a transfer learning framework, and provide theoretical guarantees at both the population and empirical levels. Error bounds are derived for each stage under general tail conditions, and we further demonstrate that the minimax optimal rate is achieved when the auxiliary and target distributions are sufficiently close. All theoretical results are supported by simulation studies and real data analysis.
Related papers
- Learning bounds for doubly-robust covariate shift adaptation [8.24901041136559]
Distribution shift between the training domain and the test domain poses a key challenge for machine learning.<n> doubly-robust (DR) estimator combines density ratio estimation with a pilot regression model.<n>This paper establishes the first non-asymptotic learning bounds for the DR estimator.
arXiv Detail & Related papers (2025-11-14T06:46:23Z) - Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift [10.35788775775647]
We investigate the minimax lower bound of the problem when the source and target distributions are known.<n>Specifically, it first trains an optimal estimator for the function under the source distribution, and then uses a likelihood ratio reweighting procedure to calibrate the moment estimator.<n>To solve this problem, we propose a truncated version of the estimator that ensures double robustness and provide the corresponding upper bound.
arXiv Detail & Related papers (2025-06-30T01:32:36Z) - Minimax Optimality of the Probability Flow ODE for Diffusion Models [8.15094483029656]
This work develops the first end-to-end theoretical framework for deterministic ODE-based samplers.<n>We propose a smooth regularized score estimator that simultaneously controls both the $L2$ score error and the associated mean Jacobian error.<n>We demonstrate that the resulting sampler achieves the minimax rate in total variation distance, modulo logarithmic factors.
arXiv Detail & Related papers (2025-03-12T17:51:29Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Universality in Transfer Learning for Linear Models [18.427215139020625]
We study the problem of transfer learning and fine-tuning in linear models for both regression and binary classification.<n>In particular, we consider the use of gradient descent (SGD) on a linear model with pretrained weights and using a small training data set from the target distribution.
arXiv Detail & Related papers (2024-10-03T03:09:09Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Near-optimal inference in adaptive linear regression [60.08422051718195]
Even simple methods like least squares can exhibit non-normal behavior when data is collected in an adaptive manner.
We propose a family of online debiasing estimators to correct these distributional anomalies in at least squares estimation.
We demonstrate the usefulness of our theory via applications to multi-armed bandit, autoregressive time series estimation, and active learning with exploration.
arXiv Detail & Related papers (2021-07-05T21:05:11Z) - Distribution-Free Robust Linear Regression [5.532477732693]
We study random design linear regression with no assumptions on the distribution of the covariates.
We construct a non-linear estimator achieving excess risk of order $d/n$ with the optimal sub-exponential tail.
We prove an optimal version of the classical bound for the truncated least squares estimator due to Gy"orfi, Kohler, Krzyzak, and Walk.
arXiv Detail & Related papers (2021-02-25T15:10:41Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.