Related papers: Reparameterized LLM Training via Orthogonal Equivalence Transformation

Reparameterized LLM Training via Orthogonal Equivalence Transformation

URL: http://arxiv.org/abs/2506.08001v3
Date: Tue, 17 Jun 2025 16:44:36 GMT
Title: Reparameterized LLM Training via Orthogonal Equivalence Transformation
Authors: Zeju Qiu, Simon Buchholz, Tim Z. Xiao, Maximilian Dax, Bernhard Schölkopf, Weiyang Liu,
Abstract summary: We present POET, a novel training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons.<n>POET can stably optimize the objective function with improved generalization.<n>We develop efficient approximations that make POET flexible and scalable for training large-scale neural networks.
Score: 54.80172809738605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

Related papers

Weight-Parameterization in Continuous Time Deep Neural Networks for Surrogate Modeling [1.629803445577911]
Continuous-time deep learning models, such as neural ordinary differential equations (ODEs), offer a promising framework for surrogate modeling of complex physical systems.<n>A central challenge in training these models lies in learning yet stable time-varying weights, particularly under computational constraints.<n>This work investigates weight parameterization strategies that constrain temporal evolution of weights to a low-dimensional subspace spanned by basis functions.
arXiv Detail & Related papers (2025-07-29T17:49:43Z)
Can a Large Language Model Learn Matrix Functions In Context? [3.7478782183628634]
Large Language Models (LLMs) have demonstrated the ability to solve complex tasks through In-Context Learning (ICL) This paper explores the capacity of LLMs to solve non-linear numerical computations, with specific emphasis on functions of the Singular Value Decomposition.
arXiv Detail & Related papers (2024-11-24T00:33:43Z)
Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study [5.6787965501364335]
Surrogate-assisted selection is a core step in evolutionary algorithms to solve expensive optimization problems. Traditionally, this has relied on conventional machine learning methods, leveraging historical evaluated evaluations to predict the performance of new solutions. In this work, we propose a novel surrogate model based purely on LLM inference capabilities, eliminating the need for training.
arXiv Detail & Related papers (2024-06-15T15:54:00Z)
Adaptive multiple optimal learning factors for neural network training [0.0]
The proposed Adaptive Multiple Optimal Learning Factors (AMOLF) algorithm dynamically adjusts the number of learning factors based on the error change per multiply. The thesis also introduces techniques for grouping weights based on the curvature of the objective function and for compressing large Hessian matrices.
arXiv Detail & Related papers (2024-06-04T21:18:24Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers [66.823588073584]
Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. We propose a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs.
arXiv Detail & Related papers (2023-10-02T02:01:16Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.<n>Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Neural Functional Transformers [99.98750156515437]
This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers called neural functional Transformers (NFTs) NFTs respect weight-space permutation symmetries while incorporating the advantages of attention, which have exhibited remarkable success across multiple domains. We also leverage NFTs to develop Inr2Array, a novel method for computing permutation invariant representations from the weights of implicit neural representations (INRs)
arXiv Detail & Related papers (2023-05-22T23:38:27Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity. In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.