Related papers: Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach

Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach

URL: http://arxiv.org/abs/2008.13590v1
Date: Mon, 31 Aug 2020 13:28:03 GMT
Title: Efficient and Sparse Neural Networks by Pruning Weights in a Multiobjective Learning Approach
Authors: Malena Reiners and Kathrin Klamroth and Michael Stiglmayr
Abstract summary: We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions. Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Overparameterization and overfitting are common concerns when designing and training deep neural networks, that are often counteracted by pruning and regularization strategies. However, these strategies remain secondary to most learning approaches and suffer from time and computational intensive procedures. We suggest a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions in a biobjective optimization problem. As a showcase example, we use the cross entropy as a measure of the prediction accuracy while adopting an l1-penalty function to assess the total cost (or complexity) of the network parameters. The latter is combined with an intra-training pruning approach that reinforces complexity reduction and requires only marginal extra computational cost. From the perspective of multiobjective optimization, this is a truly large-scale optimization problem. We compare two different optimization paradigms: On the one hand, we adopt a scalarization-based approach that transforms the biobjective problem into a series of weighted-sum scalarizations. On the other hand we implement stochastic multi-gradient descent algorithms that generate a single Pareto optimal solution without requiring or using preference information. In the first case, favorable knee solutions are identified by repeated training runs with adaptively selected scalarization parameters. Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.

Related papers

Self-Supervised Penalty-Based Learning for Robust Constrained Optimization [4.297070083645049]
We propose a new methodology for parameterized constrained robust optimization, based on learning with a self-supervised penalty-based loss function. Our approach is able to effectively learn neural network approximations whose inference time is significantly smaller than the time of traditional solvers.
arXiv Detail & Related papers (2025-03-07T06:42:17Z)
Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery [0.0]
We propose a novel algorithm for combined unit and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply.<n>Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit pruning and computational vs. parameter complexity.<n>We show that the proposed algorithm converges to solutions of the optimization problem corresponding to networks.
arXiv Detail & Related papers (2024-11-14T02:00:22Z)
Self-Supervised Learning of Iterative Solvers for Constrained Optimization [0.0]
We propose a learning-based iterative solver for constrained optimization. It can obtain very fast and accurate solutions by customizing the solver to a specific parametric optimization problem. A novel loss function based on the Karush-Kuhn-Tucker conditions of optimality is introduced, enabling fully self-supervised training of both neural networks.
arXiv Detail & Related papers (2024-09-12T14:17:23Z)
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization [24.55623897747344]
Neural network pruning is a key technique towards engineering large yet scalable, interpretable, generalizable models. We show how many existing differentiable pruning techniques can be understood as non regularization for group sparse optimization. We propose SequentialAttention++, which advances state the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.
arXiv Detail & Related papers (2024-02-27T21:42:18Z)
Analyzing and Enhancing the Backward-Pass Convergence of Unrolled Optimization [50.38518771642365]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. A central challenge in this setting is backpropagation through the solution of an optimization problem, which often lacks a closed form. This paper provides theoretical insights into the backward pass of unrolled optimization, showing that it is equivalent to the solution of a linear system by a particular iterative method. A system called Folded Optimization is proposed to construct more efficient backpropagation rules from unrolled solver implementations.
arXiv Detail & Related papers (2023-12-28T23:15:18Z)
Reducing the Need for Backpropagation and Discovering Better Optima With Explicit Optimizations of Neural Networks [4.807347156077897]
We propose a computationally efficient alternative for optimizing neural networks. We derive an explicit solution to a simple feed-forward language model. We show that explicit solutions perform near-optimality in experiments.
arXiv Detail & Related papers (2023-11-13T17:38:07Z)
Sparse-Input Neural Network using Group Concave Regularization [10.103025766129006]
Simultaneous feature selection and non-linear function estimation are challenging in neural networks. We propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings.
arXiv Detail & Related papers (2023-07-01T13:47:09Z)
Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks. One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver. This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z)
Acceleration techniques for optimization over trained neural network ensembles [1.0323063834827415]
We study optimization problems where the objective function is modeled through feedforward neural networks with rectified linear unit activation. We present a mixed-integer linear program based on existing popular big-$M$ formulations for optimizing over a single neural network.
arXiv Detail & Related papers (2021-12-13T20:50:54Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making. We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.