On Newton Screening
- URL: http://arxiv.org/abs/2001.10616v3
- Date: Fri, 21 Apr 2023 04:52:05 GMT
- Title: On Newton Screening
- Authors: Jian Huang, Yuling Jiao, Lican Kang, Jin Liu, Yanyan Liu, Xiliang Lu,
and Yuanyuan Yang
- Abstract summary: We develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism.
We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence.
- Score: 14.040371216692645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Screening and working set techniques are important approaches to reducing the
size of an optimization problem. They have been widely used in accelerating
first-order methods for solving large-scale sparse learning problems. In this
paper, we develop a new screening method called Newton screening (NS) which is
a generalized Newton method with a built-in screening mechanism. We derive an
equivalent KKT system for the Lasso and utilize a generalized Newton method to
solve the KKT equations. Based on this KKT system, a built-in working set with
a relatively small size is first determined using the sum of primal and dual
variables generated from the previous iteration, then the primal variable is
updated by solving a least-squares problem on the working set and the dual
variable updated based on a closed-form expression. Moreover, we consider a
sequential version of Newton screening (SNS) with a warm-start strategy. We
show that NS possesses an optimal convergence property in the sense that it
achieves one-step local convergence. Under certain regularity conditions on the
feature matrix, we show that SNS hits a solution with the same signs as the
underlying true target and achieves a sharp estimation error bound with high
probability. Simulation studies and real data analysis support our theoretical
results and demonstrate that SNS is faster and more accurate than several
state-of-the-art methods in our comparative studies.
Related papers
- Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning.
However, these methods do not exploit curvature information.
Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z) - Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian [53.044526424637866]
We consider a nonconstrained unconstrained optimization problem minimizing a twice differentiable objective function with H"older Hessian.
We first propose a Newton-conjugate (Newton-CG) method for finding an approximate first- and second-order stationary point.
We present preliminary numerical results to demonstrate the superior practical performance of our parameter-free NewtonCG method.
arXiv Detail & Related papers (2023-11-22T01:50:43Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks.
We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z) - SCORE: Approximating Curvature Information under Self-Concordant
Regularization [0.0]
We propose a generalized Gauss-Newton with Self-Concordant Regularization (GGN-SCORE) algorithm that updates the minimization speed each time it receives a new input.
The proposed algorithm exploits the structure of the second-order information in the Hessian matrix, thereby reducing computational overhead.
arXiv Detail & Related papers (2021-12-14T13:03:04Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive
Sample Size Approach [33.21301562890201]
We use an adaptive sample size scheme that exploits the superlinear convergence of quasi-Newton methods globally and throughout the learning process.
We show that if the initial sample size is sufficiently large and we use quasi-Newton methods to solve each subproblem, the subproblems can be solved superlinearly fast.
arXiv Detail & Related papers (2021-06-10T01:08:51Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.