Related papers: Generalized Optimization: A First Step Towards Category Theoretic Learning Theory

Generalized Optimization: A First Step Towards Category Theoretic Learning Theory

URL: http://arxiv.org/abs/2109.10262v1
Date: Mon, 20 Sep 2021 15:19:06 GMT
Title: Generalized Optimization: A First Step Towards Category Theoretic Learning Theory
Authors: Dan Shiebler
Abstract summary: We generalize several optimization algorithms, including a straightforward generalization of gradient descent and a novel generalization of Newton's method. We show that the transformation invariances of these algorithms are preserved. We also show that we can express the change in loss of generalized descent with an inner product-like expression.
Score: 1.52292571922932
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Cartesian reverse derivative is a categorical generalization of reverse-mode automatic differentiation. We use this operator to generalize several optimization algorithms, including a straightforward generalization of gradient descent and a novel generalization of Newton's method. We then explore which properties of these algorithms are preserved in this generalized setting. First, we show that the transformation invariances of these algorithms are preserved: while generalized Newton's method is invariant to all invertible linear transformations, generalized gradient descent is invariant only to orthogonal linear transformations. Next, we show that we can express the change in loss of generalized gradient descent with an inner product-like expression, thereby generalizing the non-increasing and convergence properties of the gradient descent optimization flow. Finally, we include several numerical experiments to illustrate the ideas in the paper and demonstrate how we can use them to optimize polynomial functions over an ordered ring.

Related papers

On the Effectiveness of the z-Transform Method in Quadratic Optimization [0.0]
The z-transform of a sequence is a classical tool used within signal processing, control theory, computer science, and electrical engineering.<n>In particular, the z-transform method focuses on behaviors and allows the use of Taylor expansions.
arXiv Detail & Related papers (2025-07-04T09:12:23Z)
Covariant Gradient Descent [0.0]
We present a manifestly covariant formulation of the gradient descent method, ensuring consistency across arbitrary coordinate systems and general curved trainable spaces. We show that commonly used optimization methods such as RMSProp, Adam and AdaBelief correspond to special limits of the covariant gradient descent.
arXiv Detail & Related papers (2025-04-07T17:25:50Z)
Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms [28.046728466038022]
We study theoretical properties of a broad class of regularized algorithms with vector-valued output. We rigorously confirm the so-called saturation effect for ridge regression with vector-valued output. We present the upper bound for the finite sample risk general vector-valued spectral algorithms.
arXiv Detail & Related papers (2024-05-23T16:45:52Z)
Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks [1.3654846342364308]
We show that a first-scalable optimisation algorithm can efficiently use the exact inverse Hessian with absolute-value eigenvalues. A t-run of this series provides a new optimisation which is comparable to other first- and second-order optimisation methods.
arXiv Detail & Related papers (2023-10-23T13:11:30Z)
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks. We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z)
Non asymptotic analysis of Adaptive stochastic gradient algorithms and applications [0.0]
This paper is devoted to the non analyis of these adaptive gradient algorithms for strongly convex objective. All the theoretical results will be adapted to linear regression and regularized generalized linear model for both Adagrad algorithm and Newton algorithms.
arXiv Detail & Related papers (2023-03-01T07:36:03Z)
Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets. We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z)
Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond [33.593203156666746]
We focus on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification. We give an additional unified explanation for this generalization, that we refer to as realizability and self-boundedness. In some of these cases, our bounds significantly improve upon the existing generalization error bounds in the literature.
arXiv Detail & Related papers (2022-02-27T19:56:36Z)
A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization. We derive two families of optimization methods in one-to-one correspondence. The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z)
Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL) New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z)
Channel-Directed Gradients for Optimization of Convolutional Neural Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z)
The Convolution Exponential and Generalized Sylvester Flows [82.18442368078804]
This paper introduces a new method to build linear flows, by taking the exponential of a linear transformation. An important insight is that the exponential can be computed implicitly, which allows the use of convolutional layers. We show that the convolution exponential outperforms other linear transformations in generative flows on CIFAR10.
arXiv Detail & Related papers (2020-06-02T19:43:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.