Related papers: Matrix Sensing with Kernel Optimal Loss: Robustness and Optimization Landscape

Matrix Sensing with Kernel Optimal Loss: Robustness and Optimization Landscape

URL: http://arxiv.org/abs/2511.02122v1
Date: Mon, 03 Nov 2025 23:22:37 GMT
Title: Matrix Sensing with Kernel Optimal Loss: Robustness and Optimization Landscape
Authors: Xinyuan Song, Jiaye Teng, Ziye Ma,
Abstract summary: In traditional regression tasks, mean squared error (MSE) loss is a common choice, but it can be unreliable non-Gaussian or heavy-tailed noise.<n>We adopt a robust loss formulation based on a kernel-based estimate of the residual density and maximize the estimated log-likelihood.
Score: 10.674539579679871
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper we study how the choice of loss functions of non-convex optimization problems affects their robustness and optimization landscape, through the study of noisy matrix sensing. In traditional regression tasks, mean squared error (MSE) loss is a common choice, but it can be unreliable for non-Gaussian or heavy-tailed noise. To address this issue, we adopt a robust loss based on nonparametric regression, which uses a kernel-based estimate of the residual density and maximizes the estimated log-likelihood. This robust formulation coincides with the MSE loss under Gaussian errors but remains stable under more general settings. We further examine how this robust loss reshapes the optimization landscape by analyzing the upper-bound of restricted isometry property (RIP) constants for spurious local minima to disappear. Through theoretical and empirical analysis, we show that this new loss excels at handling large noise and remains robust across diverse noise distributions. This work offers initial insights into enhancing the robustness of machine learning tasks through simply changing the loss, guided by an intuitive and broadly applicable analytical framework.

Related papers

Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging [4.063392865490957]
Inverse problems are often ill-posed and require optimization schemes with strong stability and convergence guarantees.<n>We propose a learned Majorization-Minimization (MM) framework for inverse problems within a bilevel optimization setting.<n>We learn a structured curvature majorant that governs each MM step while preserving classical MM descent guarantees.
arXiv Detail & Related papers (2026-01-23T10:33:45Z)
Overcoming the Loss Conditioning Bottleneck in Optimization-Based PDE Solvers: A Novel Well-Conditioned Loss Function [1.6135205846394396]
PDE solvers that minimize scalar loss functions have gained increasing attention in recent years.<n>Such methods converge much more slowly than classical iterative solvers and are commonly regarded as inefficient.<n>This work provides a theoretical insight, attributing the inefficiency to the use of the mean squared error (MSE) loss.<n>By tuning a weight parameter, it flexibly modulates the condition number between the original system and its normal equations, while reducing to the MSE loss in the limiting case.
arXiv Detail & Related papers (2025-07-24T10:17:02Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner. We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z)
Support Vector Machines with the Hard-Margin Loss: Optimal Training via Combinatorial Benders' Cuts [8.281391209717105]
We show how to train the hard-margin SVM model to global optimality. We introduce an iterative sampling and sub decomposition algorithm that solves the problem.
arXiv Detail & Related papers (2022-07-15T18:21:51Z)
Robust Matrix Completion with Heavy-tailed Noise [0.5837881923712392]
This paper studies low-rank matrix completion in the presence of heavy-tailed possibly asymmetric noise. In this paper, we adopt adaptive Huber loss accommodate heavy-tailed noise, which is robust against large and possibly asymmetric errors. We prove that under merely a second moment condition on the error, the Euclidean error falls geometrically fast until achieving a minimax-optimal statistical estimation error.
arXiv Detail & Related papers (2022-06-09T04:48:48Z)
Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Beyond variance reduction: Understanding the true impact of baselines on policy optimization [24.09670734037029]
We show that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates. We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics.
arXiv Detail & Related papers (2020-08-31T17:52:09Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.