Enhancing Fractional Gradient Descent with Learned Optimizers
- URL: http://arxiv.org/abs/2510.18783v1
- Date: Tue, 21 Oct 2025 16:33:20 GMT
- Title: Enhancing Fractional Gradient Descent with Learned Optimizers
- Authors: Jan Sobotka, Petr Šimánek, Pavel Kordík,
- Abstract summary: Fractional Gradient Descent (FGD) offers a novel and promising way to accelerate optimization by incorporating fractional calculus into machine learning.<n>Our method's metas meta-learned performance is comparable to a fully black-box metalearned.<n>L2O-CFGD can thus serve as a powerful tool for researchers.
- Score: 3.3900431852643114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fractional Gradient Descent (FGD) offers a novel and promising way to accelerate optimization by incorporating fractional calculus into machine learning. Although FGD has shown encouraging initial results across various optimization tasks, it faces significant challenges with convergence behavior and hyperparameter selection. Moreover, the impact of its hyperparameters is not fully understood, and scheduling them is particularly difficult in non-convex settings such as neural network training. To address these issues, we propose a novel approach called Learning to Optimize Caputo Fractional Gradient Descent (L2O-CFGD), which meta-learns how to dynamically tune the hyperparameters of Caputo FGD (CFGD). Our method's meta-learned schedule outperforms CFGD with static hyperparameters found through an extensive search and, in some tasks, achieves performance comparable to a fully black-box meta-learned optimizer. L2O-CFGD can thus serve as a powerful tool for researchers to identify high-performing hyperparameters and gain insights on how to leverage the history-dependence of the fractional differential in optimization.
Related papers
- MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper [75.6582687942241]
We propose Mixture of Expert Prompt Tuning (MEPT) as an effective and efficient manifold-mapping framework.<n>MEPT integrates multiple prompt experts to adaptively learn diverse and non-stationary data distributions.<n> Empirical evaluations demonstrate that MEPT outperforms several state-of-the-art parameter efficient baselines on SuperGLUE.
arXiv Detail & Related papers (2025-08-31T21:19:25Z) - AYLA: Amplifying Gradient Sensitivity via Loss Transformation in Non-Convex Optimization [0.0]
Gradient Descent (SGD) and its variants, such as ADAM, are foundational to deep learning optimization.<n>This paper introduces AYLA, a novel framework that enhances dynamics training.
arXiv Detail & Related papers (2025-04-02T16:31:39Z) - Constrained Hybrid Metaheuristic Algorithm for Probabilistic Neural Networks Learning [0.3686808512438362]
This study investigates the potential of hybrid metaheuristic algorithms to enhance the training of Probabilistic Neural Networks (PNNs)<n>Traditional learning methods, such as gradient-based approaches, often struggle to optimise high-dimensional and uncertain environments.<n>We propose the constrained Hybrid Metaheuristic (cHM) algorithm, which combines multiple population-based optimisation techniques into a unified framework.
arXiv Detail & Related papers (2025-01-26T19:49:16Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function [0.0]
We introduce sigSignGrad and tanhSignGrad, two novel gradients that integrate adaptive friction coefficients.
Our theoretical analysis demonstrates the wide-ranging adjustment capability of the friction coefficient S.
Experiments on CIFAR-10, Mini-Image-Net using ResNet50 and ViT architectures confirm the superior performance our proposeds.
arXiv Detail & Related papers (2024-08-07T03:20:46Z) - Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis [0.7428236410246183]
We investigate optimized convolutional neural networks (CNNs) developed for automatic modulation classification (AMC) of wireless signals.
We propose optimized models with the combinations of these techniques to fuse the complementary optimization benefits.
The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity.
arXiv Detail & Related papers (2024-04-11T06:08:23Z) - AdaLomo: Low-memory Optimization with Adaptive Learning Rate [59.64965955386855]
We introduce low-memory optimization with adaptive learning rate (AdaLomo) for large language models.
AdaLomo results on par with AdamW, while significantly reducing memory requirements, thereby lowering the hardware barrier to training large language models.
arXiv Detail & Related papers (2023-10-16T09:04:28Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Meta-Learning to Improve Pre-Training [38.75981465367226]
Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks.
PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models.
We propose an efficient, gradient-based algorithm to meta-learn PT hyper parameters.
arXiv Detail & Related papers (2021-11-02T17:26:50Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.