Stochastic Two Points Method for Deep Model Zeroth-order Optimization
- URL: http://arxiv.org/abs/2402.01621v3
- Date: Mon, 27 May 2024 14:56:01 GMT
- Title: Stochastic Two Points Method for Deep Model Zeroth-order Optimization
- Authors: Yijiang Pang, Jiayu Zhou,
- Abstract summary: This paper introduces an efficient Two-Point (S2P) approach within the gradient-free regime.
We present the theoretical convergence properties of S2P under the general and relaxed smoothness assumptions.
We show that VS2P is highly effective in optimizing objectives for deep models.
- Score: 32.459322001738144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large foundation models, such as large language models, have performed exceptionally well in various application scenarios. Building or fully fine-tuning such large models is usually prohibitive due to either hardware budget or lack of access to backpropagation. The zeroth-order methods offer a promising direction for tackling this challenge, where only forward passes are needed to update the model. This paper introduces an efficient Stochastic Two-Point (S2P) approach within the gradient-free regime. We present the theoretical convergence properties of S2P under the general and relaxed smoothness assumptions, and the derived results help understand and inherently connect the two popular types of zeroth-order methods, basic random search and stochastic three-point method. The theoretical properties also shed light on a Variant of S2P (VS2P), through exploiting our new convergence properties that better represent the dynamics of deep models in training. Our comprehensive empirical results show that VS2P is highly effective in optimizing objectives for deep models. It outperforms or achieves competitive performance compared to standard methods across various model types and scales.
Related papers
- ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting [16.373800112150573]
3D models have recently been popularized by the potentiality of end-to-end training offered by Neural Radiance Fields and 3D Gaussian Splatting models.
We propose an approach enabling both memory and computation scalability of such models.
Our results on popular benchmarks showcase the effectiveness of the proposed approach and open the road to the broad deployability of such a solution even on resource-constrained devices.
arXiv Detail & Related papers (2024-10-30T17:01:28Z) - Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks [24.935016443423233]
This study introduces a novel optimization approach, termed the emphfunctional homotopy method.
By constructing a series of easy-to-hard optimization problems, we iteratively solve these problems using principles derived from established homotopy methods.
We apply this approach to jailbreak attack synthesis for large language models (LLMs), achieving a $20%-30%$ improvement in success rate over existing methods.
arXiv Detail & Related papers (2024-10-05T17:22:39Z) - Model Ensembling for Constrained Optimization [7.4351710906830375]
We consider a setting in which we wish to ensemble models for multidimensional output predictions that are in turn used for downstream optimization.
More precisely, we imagine we are given a number of models mapping a state space to multidimensional real-valued predictions.
These predictions form the coefficients of a linear objective that we would like to optimize under specified constraints.
We apply multicalibration techniques that lead to two provably efficient and convergent algorithms.
arXiv Detail & Related papers (2024-05-27T01:48:07Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - DiffComplete: Diffusion-based Generative 3D Shape Completion [114.43353365917015]
We introduce a new diffusion-based approach for shape completion on 3D range scans.
We strike a balance between realism, multi-modality, and high fidelity.
DiffComplete sets a new SOTA performance on two large-scale 3D shape completion benchmarks.
arXiv Detail & Related papers (2023-06-28T16:07:36Z) - Human Trajectory Prediction via Neural Social Physics [63.62824628085961]
Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored.
We propose a new method combining both methodologies based on a new Neural Differential Equation model.
Our new model (Neural Social Physics or NSP) is a deep neural network within which we use an explicit physics model with learnable parameters.
arXiv Detail & Related papers (2022-07-21T12:11:18Z) - On the Role of Bidirectionality in Language Model Pre-Training [85.14614350372004]
We study the role of bidirectionality in next token prediction, text infilling, zero-shot priming and fine-tuning.
We train models with up to 6.7B parameters, and find differences to remain consistent at scale.
arXiv Detail & Related papers (2022-05-24T02:25:05Z) - Robust Binary Models by Pruning Randomly-initialized Networks [57.03100916030444]
We propose ways to obtain robust models against adversarial attacks from randomly-d binary networks.
We learn the structure of the robust model by pruning a randomly-d binary network.
Our method confirms the strong lottery ticket hypothesis in the presence of adversarial attacks.
arXiv Detail & Related papers (2022-02-03T00:05:08Z) - A Second look at Exponential and Cosine Step Sizes: Simplicity,
Adaptivity, and Performance [23.89815527019194]
Gradient Descent (SGD) is a popular tool in large-scale machine learning models.
It is highly variable, depending crucially on the choice of the step sizes.
A variety of strategies for tuning the step sizes have been proposed.
arXiv Detail & Related papers (2020-02-12T23:10:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.