Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive
Sample Size Approach
- URL: http://arxiv.org/abs/2106.05445v1
- Date: Thu, 10 Jun 2021 01:08:51 GMT
- Title: Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive
Sample Size Approach
- Authors: Qiujiang Jin, Aryan Mokhtari
- Abstract summary: We use an adaptive sample size scheme that exploits the superlinear convergence of quasi-Newton methods globally and throughout the learning process.
We show that if the initial sample size is sufficiently large and we use quasi-Newton methods to solve each subproblem, the subproblems can be solved superlinearly fast.
- Score: 33.21301562890201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the application of quasi-Newton methods for solving
empirical risk minimization (ERM) problems defined over a large dataset.
Traditional deterministic and stochastic quasi-Newton methods can be executed
to solve such problems; however, it is known that their global convergence rate
may not be better than first-order methods, and their local superlinear
convergence only appears towards the end of the learning process. In this
paper, we use an adaptive sample size scheme that exploits the superlinear
convergence of quasi-Newton methods globally and throughout the entire learning
process. The main idea of the proposed adaptive sample size algorithms is to
start with a small subset of data points and solve their corresponding ERM
problem within its statistical accuracy, and then enlarge the sample size
geometrically and use the optimal solution of the problem corresponding to the
smaller set as an initial point for solving the subsequent ERM problem with
more samples. We show that if the initial sample size is sufficiently large and
we use quasi-Newton methods to solve each subproblem, the subproblems can be
solved superlinearly fast (after at most three iterations), as we guarantee
that the iterates always stay within a neighborhood that quasi-Newton methods
converge superlinearly. Numerical experiments on various datasets confirm our
theoretical results and demonstrate the computational advantages of our method.
Related papers
- A Sample Efficient Alternating Minimization-based Algorithm For Robust Phase Retrieval [56.67706781191521]
In this work, we present a robust phase retrieval problem where the task is to recover an unknown signal.
Our proposed oracle avoids the need for computationally spectral descent, using a simple gradient step and outliers.
arXiv Detail & Related papers (2024-09-07T06:37:23Z) - Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Robust empirical risk minimization via Newton's method [9.797319790710711]
A new variant of Newton's method for empirical risk minimization is studied.
The gradient and Hessian of the objective function are replaced by robust estimators.
An algorithm for obtaining robust Newton directions based on the conjugate gradient method is also proposed.
arXiv Detail & Related papers (2023-01-30T18:54:54Z) - A Neural Network Warm-Start Approach for the Inverse Acoustic Obstacle
Scattering Problem [7.624866197576227]
We present a neural network warm-start approach for solving the inverse scattering problem.
An initial guess for the optimization problem is obtained using a trained neural network.
The algorithm remains robust to noise in the scattered field measurements and also converges to the true solution for limited aperture data.
arXiv Detail & Related papers (2022-12-16T22:18:48Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - New Methods for Detecting Concentric Objects With High Accuracy [0.0]
Fitting geometric objects to digitized data is an important problem in many areas such as iris detection, autonomous navigation, and industrial robotics operations.
There are two common approaches to fitting geometric shapes to data: the geometric (iterative) approach and algebraic (non-iterative) approach.
We develop new estimators, which can be used as reliable initial guesses for other iterative methods.
arXiv Detail & Related papers (2021-02-16T08:19:18Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - Boosting Data Reduction for the Maximum Weight Independent Set Problem
Using Increasing Transformations [59.84561168501493]
We introduce new generalized data reduction and transformation rules for the maximum weight independent set problem.
Surprisingly, these so-called increasing transformations can simplify the problem and also open up the reduction space to yield even smaller irreducible graphs later in the algorithm.
Our algorithm computes significantly smaller irreducible graphs on all except one instance, solves more instances to optimality than previously possible, is up to two orders of magnitude faster than the best state-of-the-art solver, and finds higher-quality solutions than solvers DynWVC and HILS.
arXiv Detail & Related papers (2020-08-12T08:52:50Z) - On Newton Screening [14.040371216692645]
We develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism.
We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence.
arXiv Detail & Related papers (2020-01-27T11:25:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.