Related papers: Differentiable Zero-One Loss via Hypersimplex Projections

Differentiable Zero-One Loss via Hypersimplex Projections

URL: http://arxiv.org/abs/2602.23336v1
Date: Thu, 26 Feb 2026 18:41:31 GMT
Title: Differentiable Zero-One Loss via Hypersimplex Projections
Authors: Camilo Gomez, Pengyang Wang, Liansheng Tang,
Abstract summary: We introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance.<n>We show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems.
Score: 14.382224834970557
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.

Related papers

A Trainable Optimizer [18.195022468462753]
We present a framework that jointly trains the full gradient estimator and the trainable weights of the model.<n>Pseudo-linear TO incurs negligible computational overhead, requiring only minimal additional multiplications.<n> Experiments demonstrate that TO methods converge faster than benchmark algorithms.
arXiv Detail & Related papers (2025-08-03T14:06:07Z)
A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand.<n>We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z)
qNBO: quasi-Newton Meets Bilevel Optimization [26.0555315825777]
Bilevel optimization, addressing challenges in hierarchical learning tasks, has gained significant interest in machine learning.<n>We introduce a general framework to address these computational challenges in a coordinated manner.<n>Specifically, we leverage quasi-Newton algorithms to accelerate the resolution of the lower-level problem while efficiently approximating the inverse Hessian-vector product.
arXiv Detail & Related papers (2025-02-03T05:36:45Z)
Towards Differentiable Multilevel Optimization: A Gradient-Based Approach [1.6114012813668932]
This paper introduces a novel gradient-based approach for multilevel optimization. Our method significantly reduces computational complexity while improving both solution accuracy and convergence speed. To the best of our knowledge, this is one of the first algorithms to provide a general version of implicit differentiation.
arXiv Detail & Related papers (2024-10-15T06:17:59Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver. This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly. It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z)
GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus. By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z)
Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets. We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z)
Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.