Related papers: Symmetry-Breaking Descent for Invariant Cost Functionals

Related papers

Anisotropic local law for non-separable sample covariance matrices [10.181748307494608]
We establish local laws for sample covariance matrices $K = N-1sum_i=1N g_ig_ig_i*$ where the random vectors $g_1, ldots, g_N in Rn$ are independent with common covariance $$.<n>We discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features model $g = (Xw)$ arising in machine learning, and Gaussian measures with
arXiv Detail & Related papers (2026-02-20T03:28:51Z)
Discrete Double-Bracket Flows for Isotropic-Noise Invariant Eigendecomposition [7.186083931122418]
We study matrix-free eigendecomposition under a matrix-vector product (MVP) oracle.<n>Standard approximation methods either use fixed steps that couple stability to $|C_k|$, or adapt steps that slow down due to vanishing updates.<n>We establish global convergence via strict-saddle for the diagonalization objective and an input-to-state stability analysis, with complexity scaling as $O(|C_e|2 / (2))$ under trace-free perturbations.
arXiv Detail & Related papers (2026-02-14T13:09:29Z)
Parameter-free Algorithms for the Stochastically Extended Adversarial Model [59.81852138768642]
Existing approaches for the Extended Adversarial (SEA) model require prior knowledge of problem-specific parameters, such as the diameter of the domain $D$ and the Lipschitz constant of the loss functions $G$.<n>We develop parameter-free methods by leveraging the Optimistic Online Newton Step (OONS) algorithm to eliminate the need for these parameters.
arXiv Detail & Related papers (2025-10-06T10:53:37Z)
Automatic Discovery of One-Parameter Subgroups of Lie Groups: Compact and Non-Compact Cases of $\mathbf{SO(n)}$ and $\mathbf{SL(n)}$ [7.771878859878091]
We introduce a novel framework for the automatic discovery of one- parameter subgroups of $SO(3)$ and, more generally, $SO(n)$.<n>Our method utilizes the standard Jordan form of skew-symmetric matrices, which define the Lie algebra of $SO(n)$.<n>The effectiveness of the proposed approach is demonstrated through tasks such as pendulum modeling, moment of inertia prediction, top quark tagging and invariant regression.
arXiv Detail & Related papers (2025-09-26T11:31:38Z)
Emergence and scaling laws in SGD learning of shallow neural networks [64.48316762675141]
We study the complexity of online gradient descent (SGD) for learning a two-layer neural network with $P$ neurons on isotropic Gaussian data.<n>We provide a precise analysis of SGD dynamics for the training of a student two-layer network to minimize the mean squared error (MSE) objective.
arXiv Detail & Related papers (2025-04-28T16:58:55Z)
Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift [41.60879054101201]
Learning models whose predictions are invariant under multiple environments is a promising approach. We show that learning invariant models underperform under certain conditions. We propose a practical to implement WRI by learning the correlations $p(X_textinv) and the model parameters simultaneously.
arXiv Detail & Related papers (2024-07-25T23:27:10Z)
Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators. Key to our solution is a novel projection technique based on ideas from harmonic analysis. Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z)
Uniform $\mathcal{C}^k$ Approximation of $G$-Invariant and Antisymmetric Functions, Embedding Dimensions, and Polynomial Representations [0.0]
We show that the embedding dimension required is independent of the regularity of the target function, the accuracy of the desired approximation, and $k$. We also provide upper and lower bounds on $K$ and show that $K$ is independent of the regularity of the target function, the desired approximation accuracy, and $k$.
arXiv Detail & Related papers (2024-03-02T23:19:10Z)
Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem. We characterize the implicit bias of 1-layer transformers optimized with gradient descent. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z)
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures. This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z)
EDGI: Equivariant Diffusion for Planning with Embodied Agents [17.931089055248062]
Embodied agents operate in a structured world, often solving tasks with spatial, temporal, and permutation symmetries. We introduce the Equivariant diffuser for Generating Interactions (EDGI), an algorithm for planning and model-based reinforcement learning. EDGI is substantially more sample efficient and generalizes better across the symmetry group than non-equivariant models.
arXiv Detail & Related papers (2023-03-22T09:19:39Z)
Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset. We use fully connected neural networks to model the transformations symmetry and the corresponding generators. Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z)
Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure. In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Linear programming with unitary-equivariant constraints [2.0305676256390934]
Unitary equivariance is a natural symmetry that occurs in many contexts in physics and mathematics. We show that under additional symmetry assumptions, this problem reduces to a linear program that can be solved in time that does not scale in $d$. We also outline a possible route for extending our method to general unitary-equivariant semidefinite programs.
arXiv Detail & Related papers (2022-07-12T17:37:04Z)
Machine learning a manifold [0.0]
We propose a simple method to identify a continuous Lie algebra symmetry in a dataset through regression by an artificial neural network. Our proposal takes advantage of the $ mathcalO(epsilon2)$ scaling of the output variable under infinitesimal symmetry transformations on the input variables.
arXiv Detail & Related papers (2021-12-14T19:00:00Z)
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework [35.31946061894308]
Understanding the implicit bias of Gradient Descent (SGD) is one of the key challenges in deep learning. This paper gives a general framework for such analysis by adapting ideas from Katzenberger (1991). It yields some new results: (1) a global analysis of the implicit bias valid for $eta-2$ steps, in contrast to the local analysis of Blanc et al. ( 2020) that is only valid for $eta-1.6$ steps and (2) allowing arbitrary noise covariance.
arXiv Detail & Related papers (2021-10-13T17:50:46Z)
Generalized moduli of continuity under irregular or random deformations via multiscale analysis [0.0]
We prove that for signals in multiresolution approximation spaces $U_s$ at scale $s$, stability in $L2$ holds in the regime $|tau|_Linfty/sll 1$.<n>Instability occurs when $|tau|_Linfty/sgg 1$, and we provide a sharp upper bound for the growth rate.
arXiv Detail & Related papers (2021-04-24T16:16:30Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.