LQF: Linear Quadratic Fine-Tuning
- URL: http://arxiv.org/abs/2012.11140v1
- Date: Mon, 21 Dec 2020 06:40:20 GMT
- Title: LQF: Linear Quadratic Fine-Tuning
- Authors: Alessandro Achille, Aditya Golatkar, Avinash Ravichandran, Marzia
Polito, Stefano Soatto
- Abstract summary: We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
- Score: 114.3840147070712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifiers that are linear in their parameters, and trained by optimizing a
convex loss function, have predictable behavior with respect to changes in the
training data, initial conditions, and optimization. Such desirable properties
are absent in deep neural networks (DNNs), typically trained by non-linear
fine-tuning of a pre-trained model. Previous attempts to linearize DNNs have
led to interesting theoretical insights, but have not impacted the practice due
to the substantial performance gap compared to standard non-linear
optimization. We present the first method for linearizing a pre-trained model
that achieves comparable performance to non-linear fine-tuning on most of
real-world image classification tasks tested, thus enjoying the
interpretability of linear models without incurring punishing losses in
performance. LQF consists of simple modifications to the architecture, loss
function and optimization typically used for classification: Leaky-ReLU instead
of ReLU, mean squared loss instead of cross-entropy, and pre-conditioning using
Kronecker factorization. None of these changes in isolation is sufficient to
approach the performance of non-linear fine-tuning. When used in combination,
they allow us to reach comparable performance, and even superior in the
low-data regime, while enjoying the simplicity, robustness and interpretability
of linear-quadratic optimization.
Related papers
- Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures [14.93489065234423]
We present a general variational framework for the training of freeform nonlinearities in layered computational architectures.
The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility.
We show how to solve the numerically function-optimization problem by representing the nonlinearities in a suitable (nonuniform) B-spline basis.
arXiv Detail & Related papers (2024-08-23T14:39:27Z) - Matrix Completion via Nonsmooth Regularization of Fully Connected Neural Networks [7.349727826230864]
It has been shown that enhanced performance could be attained by using nonlinear estimators such as deep neural networks.
In this paper, we control over-fitting by regularizing FCNN model in terms of norm intermediate representations.
Our simulations indicate the superiority of the proposed algorithm in comparison with existing linear and nonlinear algorithms.
arXiv Detail & Related papers (2024-03-15T12:00:37Z) - Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously.
In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Optimal Nonlinearities Improve Generalization Performance of Random
Features [0.9790236766474201]
Random feature model with a nonlinear activation function has been shown to performally equivalent to a Gaussian model in terms of training and generalization errors.
We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities.
Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU.
arXiv Detail & Related papers (2023-09-28T20:55:21Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - The role of optimization geometry in single neuron learning [12.891722496444036]
Recent experiments have demonstrated the choice of optimization geometry can impact generalization performance when learning expressive neural model networks.
We show how the interplay between geometry and the feature geometry sets the out-of-sample leads and improves performance.
arXiv Detail & Related papers (2020-06-15T17:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.