Related papers: Optimizers Qualitatively Alter Solutions And We Should Leverage This

Optimizers Qualitatively Alter Solutions And We Should Leverage This

URL: http://arxiv.org/abs/2507.12224v1
Date: Wed, 16 Jul 2025 13:33:31 GMT
Title: Optimizers Qualitatively Alter Solutions And We Should Leverage This
Authors: Razvan Pascanu, Clare Lyle, Ionut-Vlad Modoranu, Naima Elosegui Borras, Dan Alistarh, Petar Velickovic, Sarath Chandar, Soham De, James Martens,
Abstract summary: Deep Neural Networks (DNNs) can not guarantee convergence to a unique global minimum of the loss when using only local information, such as SGD.<n>We argue that the community should aim at understanding the biases of already existing methods, as well as aim to build new DNNs with the explicit intent of inducing certain properties of the solution.
Score: 62.662640460717476
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when using optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.

Related papers

On the Learnability of Offline Model-Based Optimization: A Ranking Perspective [28.667834180549686]
offline model-based optimization (MBO) seeks to discover high-performing designs using only a fixed dataset of past evaluations.<n>Most existing methods rely on learning a surrogate model via regression and implicitly assume that good predictive accuracy leads to good optimization performance.<n>We argue that offline optimization is fundamentally a problem of ranking high-quality designs rather than accurate value prediction.
arXiv Detail & Related papers (2026-03-04T12:45:41Z)
Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z)
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation [2.5960620227199342]
We focus on optimizing the max@k metric, a continuous generalization of pass@k.<n>We extend our derivations to the off-policy updates, a common element in modern RLVR algorithms, that allows better sample efficiency.
arXiv Detail & Related papers (2025-10-27T14:47:30Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
From Function to Distribution Modeling: A PAC-Generative Approach to Offline Optimization [30.689032197123755]
This paper considers the problem of offline optimization, where the objective function is unknown except for a collection of offline" data examples. Instead of learning and then optimizing the unknown objective function, we take on a less intuitive but more direct view that optimization can be thought of as a process of sampling from a generative model.
arXiv Detail & Related papers (2024-01-04T01:32:50Z)
Fast & Fair: Efficient Second-Order Robust Optimization for Fairness in Machine Learning [0.0]
This project explores adversarial training techniques to develop fairer Deep Neural Networks (DNNs) DNNs are susceptible to inheriting bias with respect to sensitive attributes such as race and gender, which can lead to life-altering outcomes. We propose a robust optimization problem, which we demonstrate can improve fairness in several datasets.
arXiv Detail & Related papers (2024-01-04T01:02:55Z)
Unsupervised Learning for Combinatorial Optimization Needs Meta-Learning [14.86600327306136]
A general framework of unsupervised learning for optimization (CO) is to train a neural network (NN) whose output gives a problem solution by directly optimizing the CO objective. We propose a new objective of unsupervised learning for CO where the goal of learning is to search for good initialization for future problem instances rather than give direct solutions. We observe that even just the initial solution given by our model before fine-tuning can significantly outperform the baselines under various evaluation settings.
arXiv Detail & Related papers (2023-01-08T22:14:59Z)
Data-Driven Offline Decision-Making via Invariant Representation Learning [97.49309949598505]
offline data-driven decision-making involves synthesizing optimized decisions with no active interaction. A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good. In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
arXiv Detail & Related papers (2022-11-21T11:01:37Z)
DEBOSH: Deep Bayesian Shape Optimization [48.80431740983095]
We propose a novel uncertainty-based method tailored to shape optimization. It enables effective BO and increases the quality of the resulting shapes beyond that of state-of-the-art approaches.
arXiv Detail & Related papers (2021-09-28T11:01:42Z)
Persistent Neurons [4.061135251278187]
We propose a trajectory-based strategy that optimize the learning task using information from previous solutions. Persistent neurons can be regarded as a method with gradient informed bias where individual updates are corrupted by deterministic error terms. We evaluate the full and partial persistent model and show it can be used to boost the performance on a range of NN structures.
arXiv Detail & Related papers (2020-07-02T22:36:49Z)
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making. We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.