Natural Spectral Fusion: p-Exponent Cyclic Scheduling and Early Decision-Boundary Alignment in First-Order Optimization
- URL: http://arxiv.org/abs/2509.04713v1
- Date: Fri, 05 Sep 2025 00:00:00 GMT
- Title: Natural Spectral Fusion: p-Exponent Cyclic Scheduling and Early Decision-Boundary Alignment in First-Order Optimization
- Authors: Gongyue Zhang, Honghai Liu,
- Abstract summary: We propose Natural Spectral Fusion (NSF): reframing training as controllable spectral coverage and information fusion.<n>NSF has two core principles: treating the balances as a spectral controller that dynamically low- and high-frequency information.<n>We show that cyclic scheduling consistently reduces test error and demonstrates distinct convergence behavior.
- Score: 11.323131201168572
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spectral behaviors have been widely discussed in machine learning, yet the optimizer's own spectral bias remains unclear. We argue that first-order optimizers exhibit an intrinsic frequency preference that significantly reshapes the optimization path. To address this, we propose Natural Spectral Fusion (NSF): reframing training as controllable spectral coverage and information fusion rather than merely scaling step sizes. NSF has two core principles: treating the optimizer as a spectral controller that dynamically balances low- and high-frequency information; and periodically reweighting frequency bands at negligible cost, without modifying the model, data, or training pipeline. We realize NSF via a p-exponent extension of the second-moment term, enabling both positive and negative exponents, and implement it through cyclic scheduling. Theory and experiments show that adaptive methods emphasize low frequencies, SGD is near-neutral, and negative exponents amplify high-frequency information. Cyclic scheduling broadens spectral coverage, improves cross-band fusion, and induces early decision-boundary alignment, where accuracy improves even while loss remains high. Across multiple benchmarks, with identical learning-rate strategies and fixed hyperparameters, p-exponent cyclic scheduling consistently reduces test error and demonstrates distinct convergence behavior; on some tasks, it matches baseline accuracy with only one-quarter of the training cost. Overall, NSF reveals the optimizer's role as an active spectral controller and provides a unified, controllable, and efficient framework for first-order optimization.
Related papers
- Spectral Gating Networks [65.9496901693099]
We introduce Spectral Gating Networks (SGN) to introduce frequency-rich expressivity in feed-forward networks.<n>SGN augments a standard activation pathway with a compact spectral pathway and learnable gates that allow the model to start from a stable base behavior.<n>It consistently improves accuracy-efficiency trade-offs under comparable computational budgets.
arXiv Detail & Related papers (2026-02-07T20:00:49Z) - The Role of Target Update Frequencies in Q-Learning [4.76285598583384]
The target network update frequency (TUF) is a central stabilization mechanism in (deep) Q-learning.<n>We formulate periodic target updates as a nested optimization scheme in which each outer iteration applies an inexact Bellman optimality operator.<n>We show that the optimal target update frequency increases geometrically over the course of the learning process.
arXiv Detail & Related papers (2026-02-03T15:19:20Z) - Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation [45.717539734334906]
Inference-time scaling offers a versatile paradigm for aligning visual generative models with downstream objectives without parameter updates.<n>We show that existing approaches that optimize the high-dimensional initial noise suffer from severe inefficiency, as many search directions exert negligible influence on the final generation.<n>We propose Spectral Evolution Search (SES), a plug-and-play framework for initial noise optimization that executes gradient-free evolutionary search within a low-frequency subspace.
arXiv Detail & Related papers (2026-02-03T07:19:39Z) - PRISM: Structured Optimization via Anisotropic Spectral Shaping [10.078746583283754]
PRISM is an efficient, low-rank-second-order preconditioner.<n>It adaptively suppresses updates in high-variance subspaces while preserving update strength in signal-dominated directions.
arXiv Detail & Related papers (2026-02-03T04:41:11Z) - FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection [19.148841575715746]
Coreset selection compresses datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks.<n>We propose FAST, the first DNN-free distribution-matching coreset selection framework.<n>FAST significantly outperforms state-of-the-art coreset selection methods across all evaluated benchmarks, achieving an average accuracy gain of 9.12%.
arXiv Detail & Related papers (2025-11-22T09:24:57Z) - Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity [51.56484100374058]
We introduce Ringleader ASGD, the first asynchronous algorithm that attains the theoretical lower bounds for parallel computation.<n>Our analysis further establishes that Ringleader ASGD remains optimal under arbitrary gradient and even time-varying speeds.
arXiv Detail & Related papers (2025-09-26T19:19:15Z) - Adaptive Deadline and Batch Layered Synchronized Federated Learning [66.93447103966439]
Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner.<n>We propose ADEL-FL, a novel framework that jointly optimize per-round deadlines and user-specific batch sizes for layer-wise aggregation.
arXiv Detail & Related papers (2025-05-29T19:59:18Z) - KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning [15.81250204481401]
We introduce a kernel-function-based ZO framework aimed at mitigating gradient estimation bias.<n>KerZOO achieves comparable or superior performance to existing ZO baselines.<n>We show that the kernel function is an effective avenue for reducing estimation bias in ZO methods.
arXiv Detail & Related papers (2025-05-24T21:56:03Z) - More Optimal Fractional-Order Stochastic Gradient Descent for Non-Convex Optimization Problems [2.5971517743176915]
We propose 2SED Fractional-Order Gradient Descent (2FOSGD), which integrates the Two-Scale Effective Dimension (2SED) with FOSGD.<n>By tracking sensitivity and effective dimensionality, 2SEDFOSGD dynamically modulates the exponent to mitigate sluggish oscillations and hasten convergence.
arXiv Detail & Related papers (2025-05-05T19:27:36Z) - Adaptive Bayesian Optimization for Robust Identification of Stochastic Dynamical Systems [4.0148499400442095]
This paper deals with the identification of linear derivation systems, where the unknowns include system coefficients and noise variances.<n>A sample-efficient global optimization method based on Bayesian optimization is proposed.<n>Experiments show that the EGP-based BO consistently outperforms MLE via steady-state filtering and expectation-maximization.
arXiv Detail & Related papers (2025-03-09T01:38:21Z) - Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise.
Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise.
We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - FedNAR: Federated Optimization with Normalized Annealing Regularization [54.42032094044368]
We explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms.
We develop Federated optimization with Normalized Annealing Regularization (FedNAR), a plug-in that can be seamlessly integrated into any existing FL algorithms.
arXiv Detail & Related papers (2023-10-04T21:11:40Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - Approximate Inference for Spectral Mixture Kernel [25.087829816206813]
We propose an approximate Bayesian inference for the spectral mixture kernel.
We optimize the variational parameters by applying a sampling-based variational inference to the derived evidence lower bound (ELBO) estimator.
The proposed inference combined with two strategies accelerates the convergence of the parameters and leads to better optimal parameters.
arXiv Detail & Related papers (2020-06-12T09:39:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.