Related papers: Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications

Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications

URL: http://arxiv.org/abs/2212.11429v3
Date: Wed, 2 Aug 2023 19:10:31 GMT
Title: Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications
Authors: Matthew Streeter and Joshua V. Dillon
Abstract summary: We present a new algorithm for automatically bounding the Taylor remainder series. Our algorithm can make efficient use of machine learning hardware, and we provide an open source implementation in JAX. In a companion paper we use our new machinery to create the first universal majorization-minimization optimization algorithms.
Score: 10.824474741383723
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a new algorithm for automatically bounding the Taylor remainder series. In the special case of a scalar function $f: \mathbb{R} \to \mathbb{R}$, our algorithm takes as input a reference point $x_0$, trust region $[a, b]$, and integer $k \ge 1$, and returns an interval $I$ such that $f(x) - \sum_{i=0}^{k-1} \frac {1} {i!} f^{(i)}(x_0) (x - x_0)^i \in I (x - x_0)^k$ for all $x \in [a, b]$. As in automatic differentiation, the function $f$ is provided to the algorithm in symbolic form, and must be composed of known atomic functions. At a high level, our algorithm has two steps. First, for a variety of commonly-used elementary functions (e.g., $\exp$, $\log$), we use recently-developed theory to derive sharp polynomial upper and lower bounds on the Taylor remainder series. We then recursively combine the bounds for the elementary functions using an interval arithmetic variant of Taylor-mode automatic differentiation. Our algorithm can make efficient use of machine learning hardware accelerators, and we provide an open source implementation in JAX. We then turn our attention to applications. Most notably, in a companion paper we use our new machinery to create the first universal majorization-minimization optimization algorithms: algorithms that iteratively minimize an arbitrary loss using a majorizer that is derived automatically, rather than by hand. We also show that our automatically-derived bounds can be used for verified global optimization and numerical integration, and to prove sharper versions of Jensen's inequality.

Related papers

Approaching Optimality for Solving Dense Linear Systems with Low-Rank Structure [16.324043075920564]
We provide new high-accuracy randomized algorithms for solving linear systems and regression problems.<n>Our algorithms nearly-match a natural complexity limit under dense inputs for these problems.<n>We show how to obtain these running times even under the weaker assumption that all but $k$ of the singular values have a bounded generalized mean.
arXiv Detail & Related papers (2025-07-15T20:48:30Z)
A simple and improved algorithm for noisy, convex, zeroth-order optimisation [59.51990161522328]
We construct an algorithm that returns a point $hat xin barmathcal X$ such that $f(hat x)$ is as small as possible. We prove that this method is such that the $f(hat x) - min_xin barmathcal X f(x)$ is of smaller order than $d2/sqrtn$ up to poly-logarithmic terms.
arXiv Detail & Related papers (2024-06-26T18:19:10Z)
First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians [4.62316736194615]
We develop Lip-order (Hessian-O) and zero-order (derivative-free) implementations of general non-free$ normfree problems. We also equip our algorithms with the lazy bound update that reuses a previously computed Hessian approximation matrix for several iterations.
arXiv Detail & Related papers (2023-09-05T17:40:54Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$. In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems [8.0153031008486]
We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings. We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
arXiv Detail & Related papers (2023-01-08T21:46:27Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z)
Parameter-free Stochastic Optimization of Variationally Coherent Functions [19.468067110814808]
We design and analyze an algorithm for first-order optimization of a class of functions on $mathbbRdilon. It is the first to achieve both these at the same time.
arXiv Detail & Related papers (2021-01-30T15:05:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.