Projection-free Online Learning with Arbitrary Delays
- URL: http://arxiv.org/abs/2204.04964v2
- Date: Sat, 20 May 2023 08:14:20 GMT
- Title: Projection-free Online Learning with Arbitrary Delays
- Authors: Yuanyu Wan and Yibo Wang and Chang Yao and Wei-Wei Tu and Lijun Zhang
- Abstract summary: We generalize the online Frank-Wolfe (OFW) algorithm and the online smooth projection-free (OSPF) algorithm into a delayed setting.
Our novel analysis shows that under a relatively large amount of delay, the generalized OFW and OSPF enjoy the same regret bound as OFW and OSPF in the non-delayed setting.
- Score: 38.13351554274417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Projection-free online learning, which eschews the projection operation via
less expensive computations such as linear optimization (LO), has received much
interest recently due to its efficiency in handling high-dimensional problems
with complex constraints. However, previous studies assume that any queried
gradient is revealed immediately, which may not hold in practice and limits
their applications. To address this limitation, we generalize the online
Frank-Wolfe (OFW) algorithm and the online smooth projection-free (OSPF)
algorithm, which are state-of-the-art LO-based projection-free online
algorithms for non-smooth and smooth functions respectively, into a delayed
setting where queried gradients can be delayed by arbitrary rounds.
Specifically, the main idea of our generalized OFW is to perform an update
similar to the original OFW after receiving any delayed gradient, and play the
latest decision for each round. Moreover, the essential change on OSPF is to
replace the sum of queried gradients, which is originally utilized in each
update, with the sum of available gradients. Despite their simplicities, our
novel analysis shows that under a relatively large amount of delay, the
generalized OFW and OSPF enjoy the same regret bound as OFW and OSPF in the
non-delayed setting, respectively.
Related papers
- Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.
Non-smooth regularization is often incorporated into machine learning tasks.
We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Accelerated Training through Iterative Gradient Propagation Along the Residual Path [46.577761606415805]
Highway backpropagation is a parallelizable iterative algorithm that approximates backpropagation.
It is adaptable to a diverse set of common architectures, ranging from ResNets and Transformers to recurrent neural networks.
arXiv Detail & Related papers (2025-01-28T17:14:42Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Online Dynamic Submodular Optimization [0.0]
We propose new algorithms with provable performance for online binary optimization.
We numerically test our algorithms in two power system applications: fast-timescale demand response and real-time distribution network reconfiguration.
arXiv Detail & Related papers (2023-06-19T10:37:15Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Isotuning With Applications To Scale-Free Online Learning [19.52475623314373]
We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms.
Our first and main tool, isotuning, is a generalization of the idea of designing adaptive learning rates that balance the trade-off of the regret.
The second tool is an online correction, which allows us to obtain centered bounds for many algorithms, to prevent the regret bounds from being vacuous.
The last tool, null updates, prevents the algorithm from performing overly large updates, which could result in unbounded regret.
arXiv Detail & Related papers (2021-12-29T14:58:56Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Inertial Proximal Deep Learning Alternating Minimization for Efficient
Neutral Network Training [16.165369437324266]
This work develops an improved DLAM by the well-known inertial technique, namely iPDLAM, which predicts a point by linearization of current and last iterates.
Numerical results on real-world datasets are reported to demonstrate the efficiency of our proposed algorithm.
arXiv Detail & Related papers (2021-01-30T16:40:08Z) - Fast and Complete: Enabling Complete Neural Network Verification with
Rapid and Massively Parallel Incomplete Verifiers [112.23981192818721]
We propose to use backward mode linear relaxation based analysis (LiRPA) to replace Linear Programming (LP) during the BaB process.
Unlike LP, LiRPA when applied naively can produce much weaker bounds and even cannot check certain conflicts of sub-domains during splitting.
We demonstrate an order of magnitude speedup compared to existing LP-based approaches.
arXiv Detail & Related papers (2020-11-27T16:42:12Z) - Improving the Backpropagation Algorithm with Consequentialism Weight
Updates over Mini-Batches [0.40611352512781856]
We show that it is possible to consider a multi-layer neural network as a stack of adaptive filters.
We introduce a better algorithm by predicting then emending the adverse consequences of the actions that take place in BP even before they happen.
Our experiments show the usefulness of our algorithm in the training of deep neural networks.
arXiv Detail & Related papers (2020-03-11T08:45:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.