Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
- URL: http://arxiv.org/abs/2502.13166v2
- Date: Mon, 29 Sep 2025 05:36:12 GMT
- Title: Mitigating Barren Plateaus in Quantum Neural Networks via an AI-Driven Submartingale-Based Framework
- Authors: Jun Zhuang, Chaowen Guan,
- Abstract summary: We propose AdaInit to mitigate barren plateaus (BPs) in quantum neural networks (QNNs)<n>AdaInit iteratively synthesizes initial parameters for QNNs that yield non-negligible gradient variance, thereby mitigating BPs.<n>We provide rigorous theoretical analyses of the submartingale-based process and empirically validate that AdaInit consistently outperforms existing methods in maintaining higher gradient variance across various QNN scales.
- Score: 3.0617189749929348
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of noisy intermediate-scale quantum (NISQ) computing, Quantum Neural Networks (QNNs) have emerged as a promising approach for various applications, yet their training is often hindered by barren plateaus (BPs), where gradient variance vanishes exponentially in terms of the qubit size. Most existing initialization-based mitigation strategies rely heavily on pre-designed static parameter distributions, thereby lacking adaptability to diverse model sizes or data conditions. To address these limitations, we propose AdaInit, a foundational framework that leverages generative models with the submartingale property to iteratively synthesize initial parameters for QNNs that yield non-negligible gradient variance, thereby mitigating BPs. Unlike conventional one-shot initialization methods, AdaInit adaptively explores the parameter space by incorporating dataset characteristics and gradient feedback, with theoretical guarantees of convergence to finding a set of effective initial parameters for QNNs. We provide rigorous theoretical analyses of the submartingale-based process and empirically validate that AdaInit consistently outperforms existing initialization methods in maintaining higher gradient variance across various QNN scales. We believe this work may initiate a new avenue to mitigate BPs.
Related papers
- From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference [0.0]
We study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs)<n>We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model.<n>We characterize key properties including positive definiteness and both strict and practical identifiability under different input designs.<n>For computation, we develop a scalable maximum a posterior (MAP) training and prediction procedure using a Nystrm approximation, and we show how the Nystrm rank and anchor selection control the cost-accuracy trade
arXiv Detail & Related papers (2026-02-26T00:02:54Z) - Uncertainty-Preserving QBNNs: Multi-Level Quantization of SVI-Based Bayesian Neural Networks for Image Classification [9.160739594332036]
We introduce a systematic multi-level quantization framework for Variational Inference based BNNs.<n>We demonstrate that BNNs can be quantized down to 4-bit precision while maintaining both classification accuracy and uncertainty disentanglement.
arXiv Detail & Related papers (2025-12-11T12:51:42Z) - PointNSP: Autoregressive 3D Point Cloud Generation with Next-Scale Level-of-Detail Prediction [87.33016661440202]
Autoregressive point cloud generation has long lagged behind diffusion-based approaches in quality.<n>We propose PointNSP, a coarse-to-fine generative framework that preserves global shape structure at low resolutions.<n> Experiments on ShapeNet show that PointNSP establishes state-of-the-art (SOTA) generation quality for the first time within the autoregressive paradigm.
arXiv Detail & Related papers (2025-10-07T06:31:02Z) - Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z) - Neural Bridge Processes [21.702709965353804]
We propose a novel method for modeling functions where inputs x act as dynamic anchors for the entire diffusion trajectory.<n>We validate NBPs on synthetic data, EEG signal regression and image regression tasks, achieving substantial improvements over baselines.
arXiv Detail & Related papers (2025-08-10T07:44:52Z) - Partially-Supervised Neural Network Model For Quadratic Multiparametric Programming [2.765106384328772]
This study proposes a partially-supervised NN architecture that directly represents the mathematical structure of the global solution function.<n>In contrast to generic NN training approaches, the proposed PSNN method derives a large proportion of model weights directly from the mathematical properties of the optimization problem.
arXiv Detail & Related papers (2025-06-05T20:26:18Z) - Q-MAML: Quantum Model-Agnostic Meta-Learning for Variational Quantum Algorithms [4.525216077859531]
We introduce a new framework for optimizing parameterized quantum circuits (PQCs) that employs a classical, inspired by Model-Agnostic Meta-Learning (MAML) technique.
Our framework features a classical neural network, called Learner, which interacts with a PQC using the output of Learner as an initial parameter.
In the adaptation phase, the framework requires only a few PQC updates to converge to a more accurate value, while the learner remains unchanged.
arXiv Detail & Related papers (2025-01-10T12:07:00Z) - Compact Multi-Threshold Quantum Information Driven Ansatz For Strongly Interactive Lattice Spin Models [0.0]
We introduce a systematic procedure for ansatz building based on approximate Quantum Mutual Information (QMI)
Our approach generates a layered-structured ansatz, where each layer's qubit pairs are selected based on their QMI values, resulting in more efficient state preparation and optimization routines.
Our results show that the Multi-QIDA method reduces the computational complexity while maintaining high precision, making it a promising tool for quantum simulations in lattice spin models.
arXiv Detail & Related papers (2024-08-05T17:07:08Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - Challenges of variational quantum optimization with measurement shot noise [0.0]
We study the scaling of the quantum resources to reach a fixed success probability as the problem size increases.
Our results suggest that hybrid quantum-classical algorithms should possibly avoid a brute force classical outer loop.
arXiv Detail & Related papers (2023-07-31T18:01:15Z) - AskewSGD : An Annealed interval-constrained Optimisation method to train
Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights.
Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set.
Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z) - LAWS: Look Around and Warm-Start Natural Gradient Descent for Quantum
Neural Networks [11.844238544360149]
Vari quantum algorithms (VQAs) have recently received significant attention due to their promising performance in Noisy Intermediate-Scale Quantum computers (NISQ)
VQAs run on parameterized quantum circuits (PQC) with randomlyational parameters are characterized by barren plateaus (BP) where the gradient vanishes exponentially in the number of qubits.
In this paper, we first quantum natural gradient (QNG), which is one of the most popular algorithms used in VQA, from the classical first-order point of optimization.
Then, we proposed a underlineAround underline
arXiv Detail & Related papers (2022-05-05T14:16:40Z) - FLIP: A flexible initializer for arbitrarily-sized parametrized quantum
circuits [105.54048699217668]
We propose a FLexible Initializer for arbitrarily-sized Parametrized quantum circuits.
FLIP can be applied to any family of PQCs, and instead of relying on a generic set of initial parameters, it is tailored to learn the structure of successful parameters.
We illustrate the advantage of using FLIP in three scenarios: a family of problems with proven barren plateaus, PQC training to solve max-cut problem instances, and PQC training for finding the ground state energies of 1D Fermi-Hubbard models.
arXiv Detail & Related papers (2021-03-15T17:38:33Z) - A Meta-Learning Approach to the Optimal Power Flow Problem Under
Topology Reconfigurations [69.73803123972297]
We propose a DNN-based OPF predictor that is trained using a meta-learning (MTL) approach.
The developed OPF-predictor is validated through simulations using benchmark IEEE bus systems.
arXiv Detail & Related papers (2020-12-21T17:39:51Z) - Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural
Networks [0.0]
We propose a new pruning method called Pruning for Quantization (PfQ) which removes the filters that disturb the fine-tuning of the DNN.
Experiments using well-known models and datasets confirmed that the proposed method achieves higher performance with a similar model size.
arXiv Detail & Related papers (2020-11-13T04:12:54Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Optimistic Exploration even with a Pessimistic Initialisation [57.41327865257504]
Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL)
In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values.
We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network.
arXiv Detail & Related papers (2020-02-26T17:15:53Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.