Related papers: Gradient-based optimization for variational empirical Bayes multiple regression

Gradient-based optimization for variational empirical Bayes multiple regression

URL: http://arxiv.org/abs/2411.14570v1
Date: Thu, 21 Nov 2024 20:35:44 GMT
Title: Gradient-based optimization for variational empirical Bayes multiple regression
Authors: Saikat Banerjee, Peter Carbonetto, Matthew Stephens,
Abstract summary: We propose alternative optimization approaches based on gradient-based (quasi-Newton) methods. We show that GradVI produces similar predictive performance and converges in fewer iterations when predictors are highly correlated. Our methods are implemented in an open-source Python software, GradVI.
Score: 2.6763498831034043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Variational empirical Bayes (VEB) methods provide a practically attractive approach to fitting large, sparse, multiple regression models. These methods usually use coordinate ascent to optimize the variational objective function, an approach known as coordinate ascent variational inference (CAVI). Here we propose alternative optimization approaches based on gradient-based (quasi-Newton) methods, which we call gradient-based variational inference (GradVI). GradVI exploits a recent result from Kim et. al. [arXiv:2208.10910] which writes the VEB regression objective function as a penalized regression. Unfortunately the penalty function is not available in closed form, and we present and compare two approaches to dealing with this problem. In simple situations where CAVI performs well, we show that GradVI produces similar predictive performance, and GradVI converges in fewer iterations when the predictors are highly correlated. Furthermore, unlike CAVI, the key computations in GradVI are simple matrix-vector products, and so GradVI is much faster than CAVI in settings where the design matrix admits fast matrix-vector products (e.g., as we show here, trendfiltering applications) and lends itself to parallelized implementations in ways that CAVI does not. GradVI is also very flexible, and could exploit automatic differentiation to easily implement different prior families. Our methods are implemented in an open-source Python software, GradVI (available from https://github.com/stephenslab/gradvi ).

Related papers

Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z)
AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix [9.629238108795013]
We propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successive steps as the diagonal elements. We evaluate AGD on public generalization of Natural Language Processing (NLP), Computer Vision (CV), and Recommendation Systems (RecSys) Our experimental results demonstrate that AGD outperforms the state-of-the-art (SOTA) techniques, achieving highly competitive or significantly better predictive performance.
arXiv Detail & Related papers (2023-12-04T06:20:14Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
Natural Gradient Hybrid Variational Inference with Application to Deep Mixed Models [0.0]
We propose a fast and accurate variational inference (VI) method for deep mixed models. It employs a well-defined gradient variational optimization that targets the joint posterior of the global parameters and latent variables. We show that the approach is faster and more accurate than two cutting-edge natural VI gradient methods.
arXiv Detail & Related papers (2023-02-27T06:24:20Z)
Optimization using Parallel Gradient Evaluations on Multiple Parameters [51.64614793990665]
We propose a first-order method for convex optimization, where gradients from multiple parameters can be used during each step of gradient descent. Our method uses gradients from multiple parameters in synergy to update these parameters together towards the optima.
arXiv Detail & Related papers (2023-02-06T23:39:13Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
A Variant of Gradient Descent Algorithm Based on Gradient Averaging [0.0]
In regression tasks, it is observed that the behaviour of Grad-Avg is almost identical with Gradient Descent (SGD) We show that Grad-Avg converges faster than the other state-of-the-arts for the classification task on two benchmark datasets.
arXiv Detail & Related papers (2020-12-04T03:43:12Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods. Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.