Related papers: Irrational Complex Rotations Empower Low-bit Optimizers

Irrational Complex Rotations Empower Low-bit Optimizers

URL: http://arxiv.org/abs/2501.12896v1
Date: Wed, 22 Jan 2025 14:17:57 GMT
Title: Irrational Complex Rotations Empower Low-bit Optimizers
Authors: Zhen Tian, Wayne Xin Zhao, Ji-Rong Wen,
Abstract summary: We propose a novel state compression algorithm, namely $pi$-Quant, for memory-efficient training.<n>We show that it can reduce the bit-width of parameters to 3.32-bit, achieving a 75% reduction in parameter scale and a 40% decrease in GPU memory usage.
Score: 102.56966165088963
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a novel optimizer state compression algorithm, namely $\pi$-Quant, which leverages the properties of irrational numbers (e.g., $\pi$) for memory-efficient training. The core idea is based on our mathematical findings, which show that a pair of parameters can be represented by a single rotation angle using the complex rotation scheme. Building on this insight, we map the parameters into a complex space and perform quantization using the corresponding rotation angles. To efficiently integrate it into optimization process, we develop an efficient system of geometric equations that computes the precise rotation angles with linear complexity. We evaluate $\pi$-Quant on a wide range of tasks. Our experiments show that it can reduce the bit-width of parameters to 3.32-bit, achieving a 75% reduction in parameter scale and a 40% decrease in GPU memory usage, all while maintaining full accuracy.

Related papers

Improving Gradient Methods via Coordinate Transformations: Applications to Quantum Machine Learning [0.0]
Machine learning algorithms heavily rely on optimization algorithms based on gradients, such as gradient descent and alike. The overall performance is dependent on the appearance of local minima and barren plateaus, which slow-down calculations and lead to non-optimal solutions. In this paper we introduce a generic strategy to accelerate and improve the overall performance of such methods, allowing to alleviate the effect of barren plateaus and local minima.
arXiv Detail & Related papers (2023-04-13T18:26:05Z)
RAGO: Recurrent Graph Optimizer For Multiple Rotation Averaging [62.315673415889314]
This paper proposes a deep recurrent Rotation Averaging Graph (RAGO) for Multiple Rotation Averaging (MRA) Our framework is a real-time learning-to-optimize rotation averaging graph with a tiny size deployed for real-world applications.
arXiv Detail & Related papers (2022-12-14T13:19:40Z)
Low-complexity Approximate Convolutional Neural Networks [1.7368964547487395]
We present an approach for minimizing the computational complexity of trained Convolutional Neural Networks (ConvNet) The idea is to approximate all elements of a given ConvNet with efficient approximations capable of extreme reductions in computational complexity. Such low-complexity structures pave the way for low-power, efficient hardware designs.
arXiv Detail & Related papers (2022-07-29T21:59:29Z)
Batch-efficient EigenDecomposition for Small and Medium Matrices [65.67315418971688]
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. We propose a QR-based ED method dedicated to the application scenarios of computer vision.
arXiv Detail & Related papers (2022-07-09T09:14:12Z)
Kernel Packet: An Exact and Scalable Algorithm for Gaussian Process Regression with Mat\'ern Correlations [23.560067934682294]
We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Mat'ern correlations. The proposed algorithm is significantly superior to the existing alternatives in both the computational time and predictive accuracy.
arXiv Detail & Related papers (2022-03-07T03:30:35Z)
Unsupervised strategies for identifying optimal parameters in Quantum Approximate Optimization Algorithm [3.508346077709686]
We study unsupervised Machine Learning approaches for setting parameters without optimization. We showcase them within Recursive-QAOA up to depth $3$ where the number of QAOA parameters used per iteration is limited to $3$. We obtain similar performances to the case where we extensively optimize the angles, hence saving numerous circuit calls.
arXiv Detail & Related papers (2022-02-18T19:55:42Z)
Parallelized Computation and Backpropagation Under Angle-Parametrized Orthogonal Matrices [0.0]
We show how an apparently sequential elementary rotation parametrization can be restructured into blocks of commutative operations. We discuss parametric restrictions of interest to generative modeling and present promising performance results with a prototype GPU implementation.
arXiv Detail & Related papers (2021-05-30T00:47:03Z)
Square Root Bundle Adjustment for Large-Scale Reconstruction [56.44094187152862]
We propose a new formulation for the bundle adjustment problem which relies on nullspace marginalization of landmark variables by QR decomposition. Our approach, which we call square root bundle adjustment, is algebraically equivalent to the commonly used Schur complement trick. We show in real-world experiments with the BAL datasets that even in single precision the proposed solver achieves on average equally accurate solutions.
arXiv Detail & Related papers (2021-03-02T16:26:20Z)
Hybrid Trilinear and Bilinear Programming for Aligning Partially Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm. We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound. We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.