Symmetry-guided gradient descent for quantum neural networks
- URL: http://arxiv.org/abs/2404.06108v2
- Date: Tue, 13 Aug 2024 03:02:21 GMT
- Title: Symmetry-guided gradient descent for quantum neural networks
- Authors: Kaiming Bian, Shitao Zhang, Fei Meng, Wen Zhang, Oscar Dahlsten,
- Abstract summary: We formulate the symmetry constraints into a concise mathematical form.
We design two ways to adopt the constraints into the cost function.
We call the method symmetry-guided gradient descent (SGGD)
- Score: 5.170906880400192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many supervised learning tasks have intrinsic symmetries, such as translational and rotational symmetry in image classifications. These symmetries can be exploited to enhance performance. We formulate the symmetry constraints into a concise mathematical form. We design two ways to adopt the constraints into the cost function, thereby shaping the cost landscape in favour of parameter choices which respect the given symmetry. Unlike methods that alter the neural network circuit ansatz to impose symmetry, our method only changes the classical post-processing of gradient descent, which is simpler to implement. We call the method symmetry-guided gradient descent (SGGD). We illustrate SGGD in entanglement classification of Werner states and in a binary classification task in a 2-D feature space. In both cases, the results show that SGGD can accelerate the training, improve the generalization ability, and remove vanishing gradients, especially when the training data is biased.
Related papers
- Enhancing lattice kinetic schemes for fluid dynamics with Lattice-Equivariant Neural Networks [79.16635054977068]
We present a new class of equivariant neural networks, dubbed Lattice-Equivariant Neural Networks (LENNs)
Our approach develops within a recently introduced framework aimed at learning neural network-based surrogate models Lattice Boltzmann collision operators.
Our work opens towards practical utilization of machine learning-augmented Lattice Boltzmann CFD in real-world simulations.
arXiv Detail & Related papers (2024-05-22T17:23:15Z) - Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent [8.347295051171525]
We show that gradient noise creates a systematic interplay of parameters $theta$ along the degenerate direction to a unique-independent fixed point $theta*$.
These points are referred to as the it noise equilibria because, at these points, noise contributions from different directions are balanced and aligned.
We show that the balance and alignment of gradient noise can serve as a novel alternative mechanism for explaining important phenomena such as progressive sharpening/flattening and representation formation within neural networks.
arXiv Detail & Related papers (2024-02-11T13:00:04Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Learning Layer-wise Equivariances Automatically using Gradients [66.81218780702125]
Convolutions encode equivariance symmetries into neural networks leading to better generalisation performance.
symmetries provide fixed hard constraints on the functions a network can represent, need to be specified in advance, and can not be adapted.
Our goal is to allow flexible symmetry constraints that can automatically be learned from data using gradients.
arXiv Detail & Related papers (2023-10-09T20:22:43Z) - Symmetry Induces Structure and Constraint of Learning [0.0]
We unveil the importance of the loss function symmetries in affecting, if not deciding, the learning behavior of machine learning models.
Common instances of mirror symmetries in deep learning include rescaling, rotation, and permutation symmetry.
We show that the theoretical framework can explain intriguing phenomena, such as the loss of plasticity and various collapse phenomena in neural networks.
arXiv Detail & Related papers (2023-09-29T02:21:31Z) - Symmetries in the dynamics of wide two-layer neural networks [0.0]
We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias)
We first describe a general class of symmetries which, when satisfied by the target function $f*$ and the input distribution, are preserved by the dynamics.
arXiv Detail & Related papers (2022-11-16T08:59:26Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
Dynamics [26.485269202381932]
Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning.
We show that any such symmetry imposes stringent geometric constraints on gradients and Hessians, leading to an associated conservation law.
We apply tools from finite difference methods to derive modified gradient flow, a differential equation that better approximates the numerical trajectory taken by SGD at finite learning rates.
arXiv Detail & Related papers (2020-12-08T20:33:30Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Understanding Gradient Clipping in Private SGD: A Geometric Perspective [68.61254575987013]
Deep learning models are increasingly popular in many machine learning applications where the training data may contain sensitive information.
Many learning systems now incorporate differential privacy by training their models with (differentially) private SGD.
A key step in each private SGD update is gradient clipping that shrinks the gradient of an individual example whenever its L2 norm exceeds some threshold.
arXiv Detail & Related papers (2020-06-27T19:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.