Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR
Prediction
- URL: http://arxiv.org/abs/2107.14432v5
- Date: Wed, 18 Oct 2023 07:59:36 GMT
- Title: Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR
Prediction
- Authors: Yun Yue, Yongchao Liu, Suo Tong, Minghao Li, Zhen Zhang, Chunyang Wen,
Huanjun Bao, Lihong Gu, Jinjie Gu, Yixiang Mu
- Abstract summary: We develop a novel framework that adds regularizers of the sparse group lasso to a family of adaptives in deep learning.
We establish proven convergence guarantees in the theoretically convex settings.
Our methods can achieve extremely high sparsity with significantly better or highly competitive performance.
- Score: 19.71671771503269
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a novel framework that adds the regularizers of the sparse group
lasso to a family of adaptive optimizers in deep learning, such as Momentum,
Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which
are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group
AdaHessian, etc., accordingly. We establish theoretically proven convergence
guarantees in the stochastic convex settings, based on primal-dual methods. We
evaluate the regularized effect of our new optimizers on three large-scale
real-world ad click datasets with state-of-the-art deep learning models. The
experimental results reveal that compared with the original optimizers with the
post-processing procedure which uses the magnitude pruning method, the
performance of the models can be significantly improved on the same sparsity
level. Furthermore, in comparison to the cases without magnitude pruning, our
methods can achieve extremely high sparsity with significantly better or highly
competitive performance. The code is available at
https://github.com/intelligent-machine-learning/dlrover/blob/master/tfplus.
Related papers
- Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis [0.7428236410246183]
We investigate optimized convolutional neural networks (CNNs) developed for automatic modulation classification (AMC) of wireless signals.
We propose optimized models with the combinations of these techniques to fuse the complementary optimization benefits.
The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity.
arXiv Detail & Related papers (2024-04-11T06:08:23Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - Soft Merging: A Flexible and Robust Soft Model Merging Approach for
Enhanced Neural Network Performance [6.599368083393398]
Gradient (SGD) is often limited to converging local optima to improve model performance.
em soft merging method minimizes the obtained local optima models in undesirable results.
Experiments underscore the effectiveness of the merged networks.
arXiv Detail & Related papers (2023-09-21T17:07:31Z) - Bidirectional Looking with A Novel Double Exponential Moving Average to
Adaptive and Non-adaptive Momentum Optimizers [109.52244418498974]
We propose a novel textscAdmeta (textbfADouble exponential textbfMov averagtextbfE textbfAdaptive and non-adaptive momentum) framework.
We provide two implementations, textscAdmetaR and textscAdmetaS, the former based on RAdam and the latter based on SGDM.
arXiv Detail & Related papers (2023-07-02T18:16:06Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles.
We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates.
We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z) - GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector [156.43671738038657]
We present a novel end-to-end group collaborative learning network, termed GCoNet+.
GCoNet+ can effectively and efficiently identify co-salient objects in natural scenes.
arXiv Detail & Related papers (2022-05-30T23:49:19Z) - Adaptive Optimization with Examplewise Gradients [23.504973357538418]
We propose a new, more general approach to the design of gradient-based optimization methods for machine learning.
In this new framework, iterations assume access to a batch of estimates per parameter, rather than a single estimate.
This better reflects the information that is actually available in typical machine learning setups.
arXiv Detail & Related papers (2021-11-30T23:37:01Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - An Efficient Framework for Clustered Federated Learning [26.24231986590374]
We address the problem of federated learning (FL) where users are distributed into clusters.
We propose the Iterative Federated Clustering Algorithm (IFCA)
We show that our algorithm is efficient in non- partitioned problems such as neural networks.
arXiv Detail & Related papers (2020-06-07T08:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.