Related papers: A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks

A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks

URL: http://arxiv.org/abs/2510.25366v2
Date: Thu, 30 Oct 2025 08:16:40 GMT
Title: A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks
Authors: Tomas Hrycej, Bernhard Bermeitinger, Massimo Pavone, Götz-Henrik Wiegand, Siegfried Handschuh,
Abstract summary: The key task of machine learning is to efficiently grounded the loss function that measures the model fit to the data.<n>We propose a novel framework to minimize non-worldity convexity from initial optimum gradient minimizing the swap.
Score: 1.7701764220380956
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The key task of machine learning is to minimize the loss function that measures the model fit to the training data. The numerical methods to do this efficiently depend on the properties of the loss function. The most decisive among these properties is the convexity or non-convexity of the loss function. The fact that the loss function can have, and frequently has, non-convex regions has led to a widespread commitment to non-convex methods such as Adam. However, a local minimum implies that, in some environment around it, the function is convex. In this environment, second-order minimizing methods such as the Conjugate Gradient (CG) give a guaranteed superlinear convergence. We propose a novel framework grounded in the hypothesis that loss functions in real-world tasks swap from initial non-convexity to convexity towards the optimum. This is a property we leverage to design an innovative two-phase optimization algorithm. The presented algorithm detects the swap point by observing the gradient norm dependence on the loss. In these regions, non-convex (Adam) and convex (CG) algorithms are used, respectively. Computing experiments confirm the hypothesis that this simple convexity structure is frequent enough to be practically exploited to substantially improve convergence and accuracy.

Related papers

Don't Be Greedy, Just Relax! Pruning LLMs via Frank-Wolfe [61.68406997155879]
State-of-the-art Large Language Model (LLM) pruning methods operate layer-wise, minimizing the per-layer pruning error on a small dataset to avoid full retraining.<n>Existing methods hence rely on greedy convexs that ignore the weight interactions in the pruning objective.<n>Our method drastically reduces the per-layer pruning error, outperforms strong baselines on state-of-the-art GPT architectures, and remains memory-efficient.
arXiv Detail & Related papers (2025-10-15T16:13:44Z)
Deep Loss Convexification for Learning Iterative Models [11.36644967267829]
Iterative methods such as iterative closest point (ICP) for point cloud registration often suffer from bad local optimality. We propose learning to form a convex landscape around each ground truth.
arXiv Detail & Related papers (2024-11-16T01:13:04Z)
Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time [45.72323731094864]
In this paper, we study the optimality gap between two-layer ReLULU networks regularized with weight decay and their convex relaxations. Our study sheds new light on understanding why local methods work well.
arXiv Detail & Related papers (2024-02-06T01:29:35Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Accelerated Neural Network Training with Rooted Logistic Objectives [13.400503928962756]
We derive a novel sequence of em strictly convex functions that are at least as strict as logistic loss. Our results illustrate that training with rooted loss function is converged faster and gains performance improvements.
arXiv Detail & Related papers (2023-10-05T20:49:48Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
Computationally Efficient and Statistically Optimal Robust High-Dimensional Linear Regression [15.389011827844572]
High-tailed linear regression under heavy-tailed noise or objective corruption is challenging, both computationally statistically. In this paper, we introduce an algorithm for both the noise Gaussian or heavy 1 + epsilon regression problems.
arXiv Detail & Related papers (2023-05-10T14:31:03Z)
Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning. This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z)
The Geometry and Calculus of Losses [10.451984251615512]
We develop the theory of loss functions for binary and multiclass classification and class probability estimation problems. The perspective provides three novel opportunities. It enables the development of a fundamental relationship between losses and (anti)-norms that appears to have not been noticed before. Second, it enables the development of a calculus of losses induced by the calculus of convex sets. Third, the perspective leads to a natural theory of polar'' loss functions, which are derived from the polar dual of the convex set defining the loss.
arXiv Detail & Related papers (2022-09-01T05:57:19Z)
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks [59.822151945132525]
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function. We study the impact of the location of the collocation points on the trainability of these models. We propose a novel adaptive collocation scheme which progressively allocates more collocation points to areas where the model is making higher errors.
arXiv Detail & Related papers (2022-07-08T18:17:06Z)
Universal Online Convex Optimization Meets Second-order Bounds [74.0120666722487]
We propose a simple strategy for universal online convex optimization. The key idea is to construct a set of experts to process the original online functions, and deploy a meta-algorithm over the linearized losses. In this way, we can plug in off-the-shelf online solvers as black-box experts to deliver problem-dependent regret bounds.
arXiv Detail & Related papers (2021-05-08T11:43:49Z)
Aligning Partially Overlapping Point Sets: an Inner Approximation Algorithm [80.15123031136564]
We propose a robust method to align point sets where there is no prior information about the value of the transformation. Our algorithm does not need regularization on transformation, and thus can handle the situation where there is no prior information about the values of the transformations. Experimental results demonstrate the better robustness of the proposed method over state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-05T15:23:33Z)
Provably Convergent Working Set Algorithm for Non-Convex Regularized Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees. Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.