Symmetry-Breaking Descent for Invariant Cost Functionals
- URL: http://arxiv.org/abs/2505.13578v2
- Date: Sat, 30 Aug 2025 17:41:32 GMT
- Title: Symmetry-Breaking Descent for Invariant Cost Functionals
- Authors: Mikhail Osipov,
- Abstract summary: We study the problem of reducing a task cost functional $W : Hs(M) to mathbbR$, not assumed continuous or differentiable.<n>We show that symmetry-breaking deformations of the signal can reduce the cost.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study the problem of reducing a task cost functional $W : H^s(M) \to \mathbb{R}$, not assumed continuous or differentiable, defined over Sobolev-class signals $S \in H^s(M) $, in the presence of a global symmetry group $G \subset \mathrm{Diff}(M)$. The group acts on signals by pullback, and the cost $W$ is invariant under this action. Such scenarios arise in machine learning and related optimization tasks, where performance metrics may be discontinuous or model-internal. We propose a variational method that exploits the symmetry structure to construct explicit deformations of the input signal. A deformation control field $ \phi: M \to \mathbb R^d$, obtained by minimizing an auxiliary energy functional, induces a flow that generically lies in the normal space (with respect to the $L^2$ inner product) to the $G$-orbit of $S$, and hence is a natural candidate to cross the decision boundary of the $G $-invariant cost. We analyze two variants of the coupling term: (1) purely geometric, independent of $W$, and (2) weakly coupled to $W$. Under mild conditions, we show that symmetry-breaking deformations of the signal can reduce the cost. Our approach requires no gradient backpropagation or training labels and operates entirely at test time. It provides a principled tool for optimizing discontinuous invariant cost functionals via Lie-algebraic variational flows.
Related papers
- Anisotropic local law for non-separable sample covariance matrices [10.181748307494608]
We establish local laws for sample covariance matrices $K = N-1sum_i=1N g_ig_ig_i*$ where the random vectors $g_1, ldots, g_N in Rn$ are independent with common covariance $$.<n>We discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features model $g = (Xw)$ arising in machine learning, and Gaussian measures with
arXiv Detail & Related papers (2026-02-20T03:28:51Z) - Discrete Double-Bracket Flows for Isotropic-Noise Invariant Eigendecomposition [7.186083931122418]
We study matrix-free eigendecomposition under a matrix-vector product (MVP) oracle.<n>Standard approximation methods either use fixed steps that couple stability to $|C_k|$, or adapt steps that slow down due to vanishing updates.<n>We establish global convergence via strict-saddle for the diagonalization objective and an input-to-state stability analysis, with complexity scaling as $O(|C_e|2 / (2))$ under trace-free perturbations.
arXiv Detail & Related papers (2026-02-14T13:09:29Z) - Parameter-free Algorithms for the Stochastically Extended Adversarial Model [59.81852138768642]
Existing approaches for the Extended Adversarial (SEA) model require prior knowledge of problem-specific parameters, such as the diameter of the domain $D$ and the Lipschitz constant of the loss functions $G$.<n>We develop parameter-free methods by leveraging the Optimistic Online Newton Step (OONS) algorithm to eliminate the need for these parameters.
arXiv Detail & Related papers (2025-10-06T10:53:37Z) - Automatic Discovery of One-Parameter Subgroups of Lie Groups: Compact and Non-Compact Cases of $\mathbf{SO(n)}$ and $\mathbf{SL(n)}$ [7.771878859878091]
We introduce a novel framework for the automatic discovery of one- parameter subgroups of $SO(3)$ and, more generally, $SO(n)$.<n>Our method utilizes the standard Jordan form of skew-symmetric matrices, which define the Lie algebra of $SO(n)$.<n>The effectiveness of the proposed approach is demonstrated through tasks such as pendulum modeling, moment of inertia prediction, top quark tagging and invariant regression.
arXiv Detail & Related papers (2025-09-26T11:31:38Z) - Emergence and scaling laws in SGD learning of shallow neural networks [64.48316762675141]
We study the complexity of online gradient descent (SGD) for learning a two-layer neural network with $P$ neurons on isotropic Gaussian data.<n>We provide a precise analysis of SGD dynamics for the training of a student two-layer network to minimize the mean squared error (MSE) objective.
arXiv Detail & Related papers (2025-04-28T16:58:55Z) - Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift [41.60879054101201]
Learning models whose predictions are invariant under multiple environments is a promising approach.
We show that learning invariant models underperform under certain conditions.
We propose a practical to implement WRI by learning the correlations $p(X_textinv) and the model parameters simultaneously.
arXiv Detail & Related papers (2024-07-25T23:27:10Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Uniform $\mathcal{C}^k$ Approximation of $G$-Invariant and Antisymmetric
Functions, Embedding Dimensions, and Polynomial Representations [0.0]
We show that the embedding dimension required is independent of the regularity of the target function, the accuracy of the desired approximation, and $k$.
We also provide upper and lower bounds on $K$ and show that $K$ is independent of the regularity of the target function, the desired approximation accuracy, and $k$.
arXiv Detail & Related papers (2024-03-02T23:19:10Z) - Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem.
We characterize the implicit bias of 1-layer transformers optimized with gradient descent.
We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - EDGI: Equivariant Diffusion for Planning with Embodied Agents [17.931089055248062]
Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries.
We introduce the Equivariant diffuser for Generating Interactions (EDGI), an algorithm for planning and model-based reinforcement learning.
EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
arXiv Detail & Related papers (2023-03-22T09:19:39Z) - Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras
from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset.
We use fully connected neural networks to model the transformations symmetry and the corresponding generators.
Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z) - Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum
Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure.
In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Linear programming with unitary-equivariant constraints [2.0305676256390934]
Unitary equivariance is a natural symmetry that occurs in many contexts in physics and mathematics.
We show that under additional symmetry assumptions, this problem reduces to a linear program that can be solved in time that does not scale in $d$.
We also outline a possible route for extending our method to general unitary-equivariant semidefinite programs.
arXiv Detail & Related papers (2022-07-12T17:37:04Z) - Machine learning a manifold [0.0]
We propose a simple method to identify a continuous Lie algebra symmetry in a dataset through regression by an artificial neural network.
Our proposal takes advantage of the $ mathcalO(epsilon2)$ scaling of the output variable under infinitesimal symmetry transformations on the input variables.
arXiv Detail & Related papers (2021-12-14T19:00:00Z) - What Happens after SGD Reaches Zero Loss? --A Mathematical Framework [35.31946061894308]
Understanding the implicit bias of Gradient Descent (SGD) is one of the key challenges in deep learning.
This paper gives a general framework for such analysis by adapting ideas from Katzenberger (1991).
It yields some new results: (1) a global analysis of the implicit bias valid for $eta-2$ steps, in contrast to the local analysis of Blanc et al. ( 2020) that is only valid for $eta-1.6$ steps and (2) allowing arbitrary noise covariance.
arXiv Detail & Related papers (2021-10-13T17:50:46Z) - Generalized moduli of continuity under irregular or random deformations via multiscale analysis [0.0]
We prove that for signals in multiresolution approximation spaces $U_s$ at scale $s$, stability in $L2$ holds in the regime $|tau|_Linfty/sll 1$.<n>Instability occurs when $|tau|_Linfty/sgg 1$, and we provide a sharp upper bound for the growth rate.
arXiv Detail & Related papers (2021-04-24T16:16:30Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.