Related papers: Training Deep Learning Models with Norm-Constrained LMOs

Related papers

Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes [3.246129789918632]
The training of deep neural networks is inherently a non- optimization problem.<n>Standard approaches such as gradient descent (SGD) require simultaneous updates to parameters.<n>We propose a novel method Alternating Train Miniization with tailored step sizes (SAMT)<n>SAMT achieves better performance with fewer parameter updates compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-08-06T08:23:38Z)
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order [38.99428012275441]
Fine-tuning Large Language Models (LLMs) is essential for adapting pre-trained models to downstream tasks.<n>Traditional first-order algorithms incur prohibitive memory and computational costs that scale poorly with model size.<n>We propose zero-order (ZO) optimization methods as a memory- and compute-efficient alternative.
arXiv Detail & Related papers (2025-06-04T20:27:17Z)
LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Go With the Flow: Fast Diffusion for Gaussian Mixture Models [13.03355083378673]
Schr"odinger Bridges (SB) are diffusion processes that steer, in finite time, a given initial distribution to another final one while minimizing a suitable cost functional.<n>We propose latentmetrization of a set of SB policies for steering a system from one distribution to another.<n>We showcase the potential this approach in low-to-dimensional problems such as image-to-image translation in the space of an autoencoder.
arXiv Detail & Related papers (2024-12-12T08:40:22Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training [3.195234044113248]
We propose textscNeuroAL, a emphtop-up algorithm for network pruning.<n>It modifies the block-wise and row-wise sparsity exploiting information from both the dense model and its sparse version.<n>It consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off.
arXiv Detail & Related papers (2024-11-11T15:30:16Z)
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [63.10833446782114]
As language models grow in size, memory demands for backpropagation increase.<n>Zeroth-order (ZO) optimization methods offer a memory-efficient alternative.<n>In this paper, we propose Subspace Zero-order optimization to address the challenges posed by posed by high dimensionality perturbations.
arXiv Detail & Related papers (2024-10-11T17:01:43Z)
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [59.875550175217874]
We show that a simple Model-based Reinforcement Learning scheme achieves strong regret and sample bounds in online and offline RL settings. We highlight that our algorithms are simple, fairly standard, and indeed have been extensively studied in the RL literature.
arXiv Detail & Related papers (2024-08-16T19:52:53Z)
A Two-Stage Training Method for Modeling Constrained Systems With Neural Networks [3.072340427031969]
This paper describes in detail the two-stage training method for Neural ODEs. The first stage aims at finding feasible NN parameters by minimizing a measure of constraints violation. The second stage aims to find the optimal NN parameters by minimizing the loss function while keeping inside the feasible region.
arXiv Detail & Related papers (2024-03-05T07:37:47Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
MAST: Model-Agnostic Sparsified Training [4.962431253126472]
We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function. Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators. We present several variants of the Gradient Descent (SGD) method adapted to the new problem formulation.
arXiv Detail & Related papers (2023-11-27T18:56:03Z)
Improving generalization in large language models by learning prefix subspaces [5.911540700785975]
This article focuses on large language models (LLMs) fine-tuning in the scarce data regime (also known as the "few-shot" learning setting) We propose a method to increase the generalization capabilities of LLMs based on neural network subspaces.
arXiv Detail & Related papers (2023-10-24T12:44:09Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
An Adaptive Incremental Gradient Method With Support for Non-Euclidean Norms [19.41328109094503]
We propose and analyze several novel adaptive variants of the popular SAGA algorithm. We establish its convergence guarantees under general settings. We improve the analysis of SAGA to support non-Euclidean norms.
arXiv Detail & Related papers (2022-04-28T09:43:07Z)
Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates. We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor. Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z)
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation [101.22379613810881]
We consider data-driven optimization problems where one must maximize a function given only queries at a fixed set of points. This problem setting emerges in many domains where function evaluation is a complex and expensive process. We propose a tractable approximation that allows us to scale our method to high-capacity neural network models.
arXiv Detail & Related papers (2021-02-16T06:04:27Z)
Conditional Neural Architecture Search [5.466990830092397]
It is often the case a well-trained ML model does not fit to the constraint of deploying edge platforms. We propose a conditional neural architecture search method using GAN, which produces feasible ML models for different platforms.
arXiv Detail & Related papers (2020-06-06T20:39:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.