Empirical Tests of Optimization Assumptions in Deep Learning
- URL: http://arxiv.org/abs/2407.01825v1
- Date: Mon, 1 Jul 2024 21:56:54 GMT
- Title: Empirical Tests of Optimization Assumptions in Deep Learning
- Authors: Hoang Tran, Qinzi Zhang, Ashok Cutkosky,
- Abstract summary: This paper develops new empirical metrics to track the key quantities that must be controlled in theoretical analysis.
All of our tested assumptions fail to reliably capture optimization performance.
This highlights a need for new empirical verification of analytical assumptions used in theoretical analysis.
- Score: 41.05664717242051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a significant gap between our theoretical understanding of optimization algorithms used in deep learning and their practical performance. Theoretical development usually focuses on proving convergence guarantees under a variety of different assumptions, which are themselves often chosen based on a rough combination of intuitive match to practice and analytical convenience. The theory/practice gap may then arise because of the failure to prove a theorem under such assumptions, or because the assumptions do not reflect reality. In this paper, we carefully measure the degree to which these assumptions are capable of explaining modern optimization algorithms by developing new empirical metrics that closely track the key quantities that must be controlled in theoretical analysis. All of our tested assumptions (including typical modern assumptions based on bounds on the Hessian) fail to reliably capture optimization performance. This highlights a need for new empirical verification of analytical assumptions used in theoretical analysis.
Related papers
- Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z) - Multi-objective Pseudo Boolean Functions in Runtime Analysis: A Review [3.3472202807168774]
We survey commonly used multi-objective functions in the theory domain and systematically review their features, limitations and implications to practical use.
We present several new functions with more realistic features, such as local optimality and nonlinearity of the Pareto front, through simply mixing and matching classical single-objective functions.
arXiv Detail & Related papers (2025-03-24T21:42:33Z) - Near-optimal Active Reconstruction [3.037563407333583]
We design an algorithm for the Next Best View (NBV) problem in the context of active object reconstruction.<n>We rigorously derive sublinear bounds for the cumulative regret of our algorithm, which guarantees near-optimality.<n>We evaluate the performance of our algorithm empirically within our simulation framework.
arXiv Detail & Related papers (2025-03-24T09:17:53Z) - A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand.
We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z) - A Generalization Result for Convergence in Learning-to-Optimize [4.112909937203119]
Learning-to-optimize leverages machine learning to accelerate optimization algorithms.<n>We develop a new proof-strategy to study convergence in learning-to-optimize.
arXiv Detail & Related papers (2024-10-10T08:17:04Z) - A Practical Theory of Generalization in Selectivity Learning [8.268822578361824]
query-driven machine learning models have emerged as a promising estimation technique for query selectivities.
We bridge the gaps between state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework.
We show that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory.
arXiv Detail & Related papers (2024-09-11T05:10:32Z) - Beyond Single-Model Views for Deep Learning: Optimization versus
Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants.
We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations.
Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z) - Uncertainty Regularized Evidential Regression [5.874234972285304]
The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory.
Specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance.
This paper provides a theoretical analysis of this limitation and introduces an improvement to overcome it.
arXiv Detail & Related papers (2024-01-03T01:18:18Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - A theoretical and empirical study of new adaptive algorithms with
additional momentum steps and shifted updates for stochastic non-convex
optimization [0.0]
It is thought that adaptive optimization algorithms represent the key pillar behind the of the Learning field.
In this paper we introduce adaptive momentum techniques for different non-smooth objective problems.
arXiv Detail & Related papers (2021-10-16T09:47:57Z) - Can convolutional ResNets approximately preserve input distances? A
frequency analysis perspective [31.897568775099558]
We show that the theoretical link between the regularisation scheme used and bi-Lipschitzness is only valid under conditions which do not hold in practice.
We present a simple constructive algorithm to search for counter examples to the distance preservation condition.
arXiv Detail & Related papers (2021-06-04T13:12:42Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.<n>We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Beyond variance reduction: Understanding the true impact of baselines on
policy optimization [24.09670734037029]
We show that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.
We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics.
arXiv Detail & Related papers (2020-08-31T17:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.