Generalized Optimization: A First Step Towards Category Theoretic
Learning Theory
- URL: http://arxiv.org/abs/2109.10262v1
- Date: Mon, 20 Sep 2021 15:19:06 GMT
- Title: Generalized Optimization: A First Step Towards Category Theoretic
Learning Theory
- Authors: Dan Shiebler
- Abstract summary: We generalize several optimization algorithms, including a straightforward generalization of gradient descent and a novel generalization of Newton's method.
We show that the transformation invariances of these algorithms are preserved.
We also show that we can express the change in loss of generalized descent with an inner product-like expression.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Cartesian reverse derivative is a categorical generalization of
reverse-mode automatic differentiation. We use this operator to generalize
several optimization algorithms, including a straightforward generalization of
gradient descent and a novel generalization of Newton's method. We then explore
which properties of these algorithms are preserved in this generalized setting.
First, we show that the transformation invariances of these algorithms are
preserved: while generalized Newton's method is invariant to all invertible
linear transformations, generalized gradient descent is invariant only to
orthogonal linear transformations. Next, we show that we can express the change
in loss of generalized gradient descent with an inner product-like expression,
thereby generalizing the non-increasing and convergence properties of the
gradient descent optimization flow. Finally, we include several numerical
experiments to illustrate the ideas in the paper and demonstrate how we can use
them to optimize polynomial functions over an ordered ring.
Related papers
- Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms [28.046728466038022]
We study theoretical properties of a broad class of regularized algorithms with vector-valued output.
We rigorously confirm the so-called saturation effect for ridge regression with vector-valued output.
We present the upper bound for the finite sample risk general vector-valued spectral algorithms.
arXiv Detail & Related papers (2024-05-23T16:45:52Z) - Series of Hessian-Vector Products for Tractable Saddle-Free Newton
Optimisation of Neural Networks [1.3654846342364308]
We show that a first-scalable optimisation algorithm can efficiently use the exact inverse Hessian with absolute-value eigenvalues.
A t-run of this series provides a new optimisation which is comparable to other first- and second-order optimisation methods.
arXiv Detail & Related papers (2023-10-23T13:11:30Z) - The Inductive Bias of Flatness Regularization for Deep Matrix
Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks.
We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z) - Non asymptotic analysis of Adaptive stochastic gradient algorithms and
applications [0.0]
This paper is devoted to the non analyis of these adaptive gradient algorithms for strongly convex objective.
All the theoretical results will be adapted to linear regression and regularized generalized linear model for both Adagrad algorithm and Newton algorithms.
arXiv Detail & Related papers (2023-03-01T07:36:03Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Stability vs Implicit Bias of Gradient Methods on Separable Data and
Beyond [33.593203156666746]
We focus on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification.
We give an additional unified explanation for this generalization, that we refer to as realizability and self-boundedness.
In some of these cases, our bounds significantly improve upon the existing generalization error bounds in the literature.
arXiv Detail & Related papers (2022-02-27T19:56:36Z) - A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization.
We derive two families of optimization methods in one-to-one correspondence.
The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - The Convolution Exponential and Generalized Sylvester Flows [82.18442368078804]
This paper introduces a new method to build linear flows, by taking the exponential of a linear transformation.
An important insight is that the exponential can be computed implicitly, which allows the use of convolutional layers.
We show that the convolution exponential outperforms other linear transformations in generative flows on CIFAR10.
arXiv Detail & Related papers (2020-06-02T19:43:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.