Related papers: CONGO: Compressive Online Gradient Optimization

CONGO: Compressive Online Gradient Optimization

URL: http://arxiv.org/abs/2407.06325v2
Date: Sat, 26 Oct 2024 22:12:25 GMT
Title: CONGO: Compressive Online Gradient Optimization
Authors: Jeremy Carleton, Prathik Vijaykumar, Divyanshu Saxena, Dheeraj Narasimha, Srinivas Shakkottai, Aditya Akella,
Abstract summary: We address the challenge of zeroth-order online convex optimization where the objective function's gradient exhibits sparsity. Our motivation stems from the optimization of large-scale queueing networks that process time-sensitive jobs.
Score: 9.706490948078018
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the challenge of zeroth-order online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function's gradient even when the only information available is a limited number of function samples. Our motivation stems from the optimization of large-scale queueing networks that process time-sensitive jobs. Here, a job must be processed by potentially many queues in sequence to produce an output, and the service time at any queue is a function of the resources allocated to that queue. Since resources are costly, the end-to-end latency for jobs must be balanced with the overall cost of the resources used. While the number of queues is substantial, the latency function primarily reacts to resource changes in only a few, rendering the gradient sparse. We tackle this problem by introducing the Compressive Online Gradient Optimization framework which allows compressive sensing methods previously applied to stochastic optimization to achieve regret bounds with an optimal dependence on the time horizon without the full problem dimension appearing in the bound. For specific algorithms, we reduce the samples required per gradient estimate to scale with the gradient's sparsity factor rather than its full dimensionality. Numerical simulations and real-world microservices benchmarks demonstrate CONGO's superiority over gradient descent approaches that do not account for sparsity.

Related papers

Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z)
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z)
Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms. This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems. We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z)
Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems. We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum. Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z)
Gradient Sparsification for Efficient Wireless Federated Learning with Differential Privacy [25.763777765222358]
Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other. As the model size grows, the training latency due to limited transmission bandwidth and private information degrades while using differential privacy (DP) protection. We propose sparsification empowered FL framework wireless channels, in over to improve training efficiency without sacrificing convergence performance.
arXiv Detail & Related papers (2023-04-09T05:21:15Z)
Continuous-Time Meta-Learning with Forward Mode Differentiation [65.26189016950343]
We introduce Continuous Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Treating the learning process as an ODE offers the notable advantage that the length of the trajectory is now continuous. We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems.
arXiv Detail & Related papers (2022-03-02T22:35:58Z)
Over-Parametrized Matrix Factorization in the Presence of Spurious Stationary Points [20.515985046189268]
This work considers the computational aspects of over-parametrized matrix factorization. In this work, we establish that the gradient flow of the corresponding merit function converges to a global minimizer. We numerically observe that a proposed discretization of the gradient flow, inspired by primal-dual algorithms, is successful when randomly.
arXiv Detail & Related papers (2021-12-25T19:13:37Z)
Gradient and Projection Free Distributed Online Min-Max Resource Optimization [26.681658600897688]
We consider distributed online min-max resource allocation with a set of parallel agents. We propose a novel online strategy called Distributed Online resource Re-Alggler (DORA) DORA does not require calculation or projection operation, unlike most existing online strategies.
arXiv Detail & Related papers (2021-12-07T18:42:07Z)
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics. We prove that our model-based procedure converges in noisy gradient setting. This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z)
Channel-Directed Gradients for Optimization of Convolutional Neural Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.