Smoothness Adaptive Hypothesis Transfer Learning
- URL: http://arxiv.org/abs/2402.14966v1
- Date: Thu, 22 Feb 2024 21:02:19 GMT
- Title: Smoothness Adaptive Hypothesis Transfer Learning
- Authors: Haotian Lin, Matthew Reimherr
- Abstract summary: Smoothness Adaptive Transfer Learning (SATL) is a two-phase kernel ridge regression(KRR)-based algorithm.
We first prove that employing the misspecified fixed bandwidth Gaussian kernel in target-only KRR learning can achieve minimax optimality.
We derive the minimax lower bound of the learning problem in excess risk and show that SATL enjoys a matching upper bound up to a logarithmic factor.
- Score: 8.557392136621894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many existing two-phase kernel-based hypothesis transfer learning algorithms
employ the same kernel regularization across phases and rely on the known
smoothness of functions to obtain optimality. Therefore, they fail to adapt to
the varying and unknown smoothness between the target/source and their offset
in practice. In this paper, we address these problems by proposing Smoothness
Adaptive Transfer Learning (SATL), a two-phase kernel ridge
regression(KRR)-based algorithm. We first prove that employing the misspecified
fixed bandwidth Gaussian kernel in target-only KRR learning can achieve minimax
optimality and derive an adaptive procedure to the unknown Sobolev smoothness.
Leveraging these results, SATL employs Gaussian kernels in both phases so that
the estimators can adapt to the unknown smoothness of the target/source and
their offset function. We derive the minimax lower bound of the learning
problem in excess risk and show that SATL enjoys a matching upper bound up to a
logarithmic factor. The minimax convergence rate sheds light on the factors
influencing transfer dynamics and demonstrates the superiority of SATL compared
to non-transfer learning settings. While our main objective is a theoretical
analysis, we also conduct several experiments to confirm our results.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms [8.681909776958184]
We develop a framework for analyzing the training and learning rate dynamics on a large class of high-dimensional optimization problems.
We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs.
We investigate in detail two adaptive learning rates -- an idealized exact line search and AdaGrad-Norm on the least squares problem.
arXiv Detail & Related papers (2024-05-30T00:27:52Z) - Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization [10.009748368458409]
We present a framework for smooth optimization of explicitly regularized objectives for (structured) sparsity.
Our method enables fully differentiable approximation-free optimization and is thus compatible with the ubiquitous gradient descent paradigm in deep learning.
arXiv Detail & Related papers (2023-07-07T13:06:12Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Minimax
Problems [80.46370778277186]
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks.
We develop a communication-efficient distributed extragrad algorithm, LocalAdaSient, with an adaptive learning rate suitable for solving convex-concave minimax problem in the.
Server model.
We demonstrate its efficacy through several experiments in both the homogeneous and heterogeneous settings.
arXiv Detail & Related papers (2021-06-18T09:42:05Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Towards Understanding Label Smoothing [36.54164997035046]
Label smoothing regularization (LSR) has a great success in deep neural networks by training algorithms.
We show that an appropriate LSR can help to speed up convergence by reducing the variance.
We propose a simple yet effective strategy, namely Two-Stage LAbel smoothing algorithm (TSLA)
arXiv Detail & Related papers (2020-06-20T20:36:17Z) - Learning Rates as a Function of Batch Size: A Random Matrix Theory
Approach to Neural Network Training [2.9649783577150837]
We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory.
We derive analytical expressions for the maximal descent and adaptive training regimens for smooth, non-Newton deep neural networks.
We validate our claims on the VGG/ResNet and ImageNet datasets.
arXiv Detail & Related papers (2020-06-16T11:55:45Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.