Symmetry-Breaking Descent for Invariant Cost Functionals
- URL: http://arxiv.org/abs/2505.13578v1
- Date: Mon, 19 May 2025 15:06:31 GMT
- Title: Symmetry-Breaking Descent for Invariant Cost Functionals
- Authors: Mikhail Osipov,
- Abstract summary: We study the problem of reducing a task cost functional $W(S)$, defined over Sobolev-class signals $S$, when the cost is invariant under a global symmetry group $G subset mathrmDiff(M)$.<n>We propose a variational method that exploits the symmetry structure to construct explicit, symmetry-breaking deformations of the input signal.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study the problem of reducing a task cost functional $W(S)$, defined over Sobolev-class signals $S$, when the cost is invariant under a global symmetry group $G \subset \mathrm{Diff}(M)$ and accessible only as a black-box. Such scenarios arise in machine learning, imaging, and inverse problems, where cost metrics reflect model outputs or performance scores but are non-differentiable and model-internal. We propose a variational method that exploits the symmetry structure to construct explicit, symmetry-breaking deformations of the input signal. A gauge field $\phi$, obtained by minimizing an auxiliary energy functional, induces a deformation $h = A_\phi[S]$ that generically lies transverse to the $G$-orbit of $S$. We prove that, under mild regularity, the cost $W$ strictly decreases along this direction -- either via Clarke subdifferential descent or by escaping locally flat plateaus. The exceptional set of degeneracies has zero Gaussian measure. Our approach requires no access to model gradients or labels and operates entirely at test time. It provides a principled tool for optimizing invariant cost functionals via Lie-algebraic variational flows, with applications to black-box models and symmetry-constrained tasks.
Related papers
- Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift [41.60879054101201]
Learning models whose predictions are invariant under multiple environments is a promising approach.
We show that learning invariant models underperform under certain conditions.
We propose a practical to implement WRI by learning the correlations $p(X_textinv) and the model parameters simultaneously.
arXiv Detail & Related papers (2024-07-25T23:27:10Z) - Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem.
We characterize the implicit bias of 1-layer transformers optimized with gradient descent.
We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - EDGI: Equivariant Diffusion for Planning with Embodied Agents [17.931089055248062]
Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries.
We introduce the Equivariant diffuser for Generating Interactions (EDGI), an algorithm for planning and model-based reinforcement learning.
EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
arXiv Detail & Related papers (2023-03-22T09:19:39Z) - Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras
from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset.
We use fully connected neural networks to model the transformations symmetry and the corresponding generators.
Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Linear programming with unitary-equivariant constraints [2.0305676256390934]
Unitary equivariance is a natural symmetry that occurs in many contexts in physics and mathematics.
We show that under additional symmetry assumptions, this problem reduces to a linear program that can be solved in time that does not scale in $d$.
We also outline a possible route for extending our method to general unitary-equivariant semidefinite programs.
arXiv Detail & Related papers (2022-07-12T17:37:04Z) - Machine learning a manifold [0.0]
We propose a simple method to identify a continuous Lie algebra symmetry in a dataset through regression by an artificial neural network.
Our proposal takes advantage of the $ mathcalO(epsilon2)$ scaling of the output variable under infinitesimal symmetry transformations on the input variables.
arXiv Detail & Related papers (2021-12-14T19:00:00Z) - What Happens after SGD Reaches Zero Loss? --A Mathematical Framework [35.31946061894308]
Understanding the implicit bias of Gradient Descent (SGD) is one of the key challenges in deep learning.
This paper gives a general framework for such analysis by adapting ideas from Katzenberger (1991).
It yields some new results: (1) a global analysis of the implicit bias valid for $eta-2$ steps, in contrast to the local analysis of Blanc et al. ( 2020) that is only valid for $eta-1.6$ steps and (2) allowing arbitrary noise covariance.
arXiv Detail & Related papers (2021-10-13T17:50:46Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.