Learning Best Combination for Efficient N:M Sparsity
- URL: http://arxiv.org/abs/2206.06662v1
- Date: Tue, 14 Jun 2022 07:51:31 GMT
- Title: Learning Best Combination for Efficient N:M Sparsity
- Authors: Yuxin Zhang, Mingbao Lin, Zhihang Lin, Yiting Luo, Ke Li, Fei Chao,
Yongjian Wu, Rongrong Ji
- Abstract summary: N:M learning can be naturally characterized as a problem which searches for the best combination within a finite collection.
We show that our learning best combination (LBC) performs consistently better than off-the-shelf N:M sparsity methods across various networks.
- Score: 75.34103761423803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: By forcing at most N out of M consecutive weights to be non-zero, the recent
N:M network sparsity has received increasing attention for its two attractive
advantages: 1) Promising performance at a high sparsity. 2) Significant
speedups on NVIDIA A100 GPUs. Recent studies require an expensive pre-training
phase or a heavy dense-gradient computation. In this paper, we show that the
N:M learning can be naturally characterized as a combinatorial problem which
searches for the best combination candidate within a finite collection.
Motivated by this characteristic, we solve N:M sparsity in an efficient
divide-and-conquer manner. First, we divide the weight vector into
$C_{\text{M}}^{\text{N}}$ combination subsets of a fixed size N. Then, we
conquer the combinatorial problem by assigning each combination a learnable
score that is jointly optimized with its associate weights. We prove that the
introduced scoring mechanism can well model the relative importance between
combination subsets. And by gradually removing low-scored subsets, N:M
fine-grained sparsity can be efficiently optimized during the normal training
phase. Comprehensive experiments demonstrate that our learning best combination
(LBC) performs consistently better than off-the-shelf N:M sparsity methods
across various networks. Our code is released at
\url{https://github.com/zyxxmu/LBC}.
Related papers
- Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training [3.195234044113248]
We propose textscNeuroAL, a emphtop-up algorithm for network pruning.
It modifies the block-wise and row-wise sparsity exploiting information from both the dense model and its sparse version.
It consistently outperforms the latest state-of-the-art methods in terms of performance-runtime trade-off.
arXiv Detail & Related papers (2024-11-11T15:30:16Z) - BOND: Aligning LLMs with Best-of-N Distillation [63.254031574394965]
We propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time.
Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution.
We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.
arXiv Detail & Related papers (2024-07-19T18:38:25Z) - Exact Combinatorial Optimization with Temporo-Attentional Graph Neural
Networks [17.128882942475]
We investigate two essential aspects of machine learning algorithms for optimization: temporal characteristics and attention.
We argue that for the task of variable selection in the branch-and-bound (B&B) algorithm, incorporating the temporal information as well as the bipartite graph attention improves the solver's performance.
arXiv Detail & Related papers (2023-11-23T08:07:15Z) - Solving a Class of Non-Convex Minimax Optimization in Federated Learning [84.98927714326908]
The minimax problems arise throughout machine learning applications, ranging from machine learning training to large-scale learning.
We propose a class of algorithms for non minimax problems (emphi) that reduce complexity to $varepsilon-6)$.
We prove that FedSGDA-M has the best sample complexity of $O(kappa2-3)$ and the best-known communication of $O(kappa2-3)$.
arXiv Detail & Related papers (2023-10-05T15:48:41Z) - Implicitly normalized forecaster with clipping for linear and non-linear
heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) is considered an optimal solution for adversarial multi-armed bandit (MAB) problems.
We propose a new version of INF called the Implicitly Normalized Forecaster with clipping (INFclip) for MAB problems with heavy-tailed settings.
We demonstrate that INFclip is optimal for linear heavy-tailed MAB problems and works well for non-linear ones.
arXiv Detail & Related papers (2023-05-11T12:00:43Z) - Searching Large Neighborhoods for Integer Linear Programs with
Contrastive Learning [39.40838358438744]
Linear Programs (ILPs) are powerful tools for modeling and solving a large number of optimization problems.
Large Neighborhood Search (LNS), as a algorithm, can find high quality solutions to ILPs faster than Branch and Bound.
We propose a novel approach, CL-LNS, that delivers state-of-the-art anytime performance on several ILP benchmarks measured by metrics.
arXiv Detail & Related papers (2023-02-03T07:15:37Z) - DOGE-Train: Discrete Optimization on GPU with End-to-end Training [28.795080637690095]
We present a fast, scalable, data-driven approach for solving relaxations of 0-1 integer linear programs.
We use a combination of graph neural networks (GNN) and the Lagrange decomposition based algorithm FastDOG.
arXiv Detail & Related papers (2022-05-23T21:09:41Z) - SiMaN: Sign-to-Magnitude Network Binarization [165.5630656849309]
We show that our weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0s otherwise.
We prove that the learned weights of binarized networks roughly follow a Laplacian distribution that does not allow entropy.
Our method, dubbed sign-to- neural network binarization (SiMaN), is evaluated on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-02-16T07:03:51Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - Curriculum learning for multilevel budgeted combinatorial problems [7.804994311050265]
Multilevel optimization problems are their generalization, encompassing situations with multiple players taking decisions sequentially.
We devise a value-based method to learn to solve multilevel budgeted problems involving two players in a zero-sum game over a graph.
Our framework is based on a simple curriculum: if an agent knows how to estimate the value of instances with budgets up to $B$, then solving instances with budget $B+1$ can be done in time regardless of the direction of every possible afterstate.
arXiv Detail & Related papers (2020-07-07T01:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.