Proximity to Losslessly Compressible Parameters
- URL: http://arxiv.org/abs/2306.02834v2
- Date: Thu, 23 May 2024 23:52:59 GMT
- Title: Proximity to Losslessly Compressible Parameters
- Authors: Matthew Farrugia-Roberts,
- Abstract summary: In neural networks, an identical function can be implemented with fewer hidden units.
In the setting of single-hidden-layer hyperbolic tangent networks, we define the rank of a parameter as the minimum number of hidden units.
We show that the problem of tightly bounding the proximate rank of a parameter is NP-complete.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To better understand complexity in neural networks, we theoretically investigate the idealised phenomenon of lossless network compressibility, whereby an identical function can be implemented with fewer hidden units. In the setting of single-hidden-layer hyperbolic tangent networks, we define the rank of a parameter as the minimum number of hidden units required to implement the same function. We give efficient formal algorithms for optimal lossless compression and computing the rank of a parameter. Losslessly compressible parameters are atypical, but their existence has implications for nearby parameters. We define the proximate rank of a parameter as the rank of the most compressible parameter within a small L-infinity neighbourhood. We give an efficient greedy algorithm for bounding the proximate rank of a parameter, and show that the problem of tightly bounding the proximate rank is NP-complete. These results lay a foundation for future theoretical and empirical work on losslessly compressible parameters and their neighbours.
Related papers
- Hyperparameter Loss Surfaces Are Simple Near their Optima [50.74035795378814]
We develop a technique based on random search to uncover the complex loss surface.<n>Within this regime, the best scores from random search take on a new distribution we discover.<n>From these features, we derive a new law for random search that can explain and extrapolate its convergence.<n>These new tools enable new analyses, such as confidence intervals for the best possible performance.
arXiv Detail & Related papers (2025-10-03T04:52:27Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Trainability Barriers in Low-Depth QAOA Landscapes [0.0]
Quantum Alternating Operator Ansatz (QAOA) is a prominent variational quantum algorithm for solving optimization problems.
Previous results have given analytical performance guarantees for a small, fixed number of parameters.
We study the difficulty of training in the intermediate regime, which is the focus of most current numerical studies.
arXiv Detail & Related papers (2024-02-15T18:45:30Z) - How Free is Parameter-Free Stochastic Optimization? [29.174036532175855]
We study the problem of parameter-free optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist.
Existing methods can only be considered partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters.
We demonstrate that a simple hyper search technique results in a fully parameter-free method that outperforms more sophisticated state-the-art algorithms.
arXiv Detail & Related papers (2024-02-05T15:51:49Z) - Parameter-Agnostic Optimization under Relaxed Smoothness [25.608968462899316]
We show that Normalized Gradient Descent with Momentum (NSGD-M) can achieve a rate-optimal complexity without prior knowledge of any problem parameter.
In deterministic settings, the exponential factor can be neutralized by employing Gradient Descent with a Backtracking Line Search.
arXiv Detail & Related papers (2023-11-06T16:39:53Z) - Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience.
We propose the first online continuous hyperparameter tuning framework for contextual bandits.
We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z) - A relaxed proximal gradient descent algorithm for convergent
plug-and-play with proximal denoiser [6.2484576862659065]
This paper presents a new convergent Plug-and-fidelity Descent (Play) algorithm.
The algorithm converges for a wider range of regular convexization parameters, thus allowing more accurate restoration of an image.
arXiv Detail & Related papers (2023-01-31T16:11:47Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Sharp Global Guarantees for Nonconvex Low-rank Recovery in the Noisy Overparameterized Regime [10.787390511207683]
We introduce a novel proof technique that unifies, simplifies, and strengthens two previously competing approaches.<n>We show that near-second-order points achieve the same minimax recovery bounds as significantly more expensive convex approaches.
arXiv Detail & Related papers (2021-04-21T23:07:18Z) - Refined bounds for algorithm configuration: The knife-edge of dual class
approximability [94.83809668933021]
We investigate how large should a training set be to ensure that a parameter's average metrics performance over the training set is close to its expected, future performance.
We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds.
We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science.
arXiv Detail & Related papers (2020-06-21T15:32:21Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.