Related papers: SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization

URL: http://arxiv.org/abs/2402.17902v1
Date: Tue, 27 Feb 2024 21:42:18 GMT
Title: SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
Authors: Taisuke Yasuda, Kyriakos Axiotis, Gang Fu, MohammadHossein Bateni, Vahab Mirrokni
Abstract summary: Neural network pruning is a key technique towards engineering large yet scalable, interpretable, generalizable models. We show how many existing differentiable pruning techniques can be understood as non regularization for group sparse optimization. We propose SequentialAttention++, which advances state the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.
Score: 24.55623897747344
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural network pruning is a key technique towards engineering large yet scalable, interpretable, and generalizable models. Prior work on the subject has developed largely along two orthogonal directions: (1) differentiable pruning for efficiently and accurately scoring the importance of parameters, and (2) combinatorial optimization for efficiently searching over the space of sparse models. We unite the two approaches, both theoretically and empirically, to produce a coherent framework for structured neural network pruning in which differentiable pruning guides combinatorial optimization algorithms to select the most important sparse set of parameters. Theoretically, we show how many existing differentiable pruning techniques can be understood as nonconvex regularization for group sparse optimization, and prove that for a wide class of nonconvex regularizers, the global optimum is unique, group-sparse, and provably yields an approximate solution to a sparse convex optimization problem. The resulting algorithm that we propose, SequentialAttention++, advances the state of the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.

Related papers

Combining Local Symmetry Exploitation and Reinforcement Learning for Optimised Probabilistic Inference -- A Work In Progress [2.2164989053903805]
Efficient probabilistic inference by variable elimination in graphical models requires an optimal elimination order. We adapt a reinforcement learning approach to find efficient contraction orders in tensor networks. We show that leveraging specific structures during inference allows for introducing compact encodings of intermediate results.
arXiv Detail & Related papers (2025-03-11T18:00:23Z)
Dynamic Incremental Optimization for Best Subset Selection [15.8362578568708]
Best subset selection is considered the gold standard for many learning problems. An efficient subset-dual algorithm is developed based on the primal and dual problem structures.
arXiv Detail & Related papers (2024-02-04T02:26:40Z)
Composite Optimization Algorithms for Sigmoid Networks [3.160070867400839]
We propose the composite optimization algorithms based on the linearized proximal algorithms and the alternating direction of multipliers. Numerical experiments on Frank's function fitting show that the proposed algorithms perform satisfactorily robustly.
arXiv Detail & Related papers (2023-03-01T15:30:29Z)
Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets. We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems [99.59934203759754]
We introduce a class of first-order methods for smooth constrained optimization. Two distinctive features of our approach are that projections or optimizations over the entire feasible set are avoided. The resulting algorithmic procedure is simple to implement even when constraints are nonlinear.
arXiv Detail & Related papers (2021-07-17T11:45:13Z)
Combinatorial Optimization for Panoptic Segmentation: An End-to-End Trainable Approach [23.281726932718232]
We propose an end-to-end trainable architecture for simultaneous semantic and instance segmentation. Our approach shows the utility of using optimization in tandem with deep learning in a challenging large scale real-world problem.
arXiv Detail & Related papers (2021-06-06T17:39:13Z)
Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions. Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z)
A Flexible Framework for Designing Trainable Priors with Adaptive Smoothing and Game Encoding [57.1077544780653]
We introduce a general framework for designing and training neural network layers whose forward passes can be interpreted as solving non-smooth convex optimization problems. We focus on convex games, solved by local agents represented by the nodes of a graph and interacting through regularization functions. This approach is appealing for solving imaging problems, as it allows the use of classical image priors within deep models that are trainable end to end.
arXiv Detail & Related papers (2020-06-26T08:34:54Z)
Stochastic batch size for adaptive regularization in deep network optimization [63.68104397173262]
We propose a first-order optimization algorithm incorporating adaptive regularization applicable to machine learning problems in deep learning framework. We empirically demonstrate the effectiveness of our algorithm using an image classification task based on conventional network models applied to commonly used benchmark datasets.
arXiv Detail & Related papers (2020-04-14T07:54:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.