Efficient Multi-Objective Optimization for Deep Learning
- URL: http://arxiv.org/abs/2103.13392v1
- Date: Wed, 24 Mar 2021 17:59:42 GMT
- Title: Efficient Multi-Objective Optimization for Deep Learning
- Authors: Michael Ruchte and Josif Grabocka
- Abstract summary: Multi-objective optimization (MOO) is a prevalent challenge for Deep Learning.
There exists no scalable MOO solution for truly deep neural networks.
- Score: 2.0305676256390934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-objective optimization (MOO) is a prevalent challenge for Deep
Learning, however, there exists no scalable MOO solution for truly deep neural
networks. Prior work either demand optimizing a new network for every point on
the Pareto front, or induce a large overhead to the number of trainable
parameters by using hyper-networks conditioned on modifiable preferences. In
this paper, we propose to condition the network directly on these preferences
by augmenting them to the feature space. Furthermore, we ensure a well-spread
Pareto front by penalizing the solutions to maintain a small angle to the
preference vector. In a series of experiments, we demonstrate that our Pareto
fronts achieve state-of-the-art quality despite being computed significantly
faster. Furthermore, we showcase the scalability as our method approximates the
full Pareto front on the CelebA dataset with an EfficientNet network at a tiny
training time overhead of 7% compared to a simple single-objective
optimization. We make our code publicly available at
https://github.com/ruchtem/cosmos.
Related papers
- Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Hyperparameter Optimization through Neural Network Partitioning [11.6941692990626]
We propose a simple and efficient way for optimizing hyper parameters in neural networks.
Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions.
We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
arXiv Detail & Related papers (2023-04-28T11:24:41Z) - Pareto Manifold Learning: Tackling multiple tasks via ensembles of
single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution.
We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Self-Evolutionary Optimization for Pareto Front Learning [34.17125297176668]
Multi-objective optimization (MOO) approaches have been proposed for multitasking problems.
Recent MOO methods approximate multiple optimal solutions (Pareto front) with a single unified model.
We show that PFL can be re-formulated into another MOO problem with multiple objectives, each of which corresponds to different preference weights for the tasks.
arXiv Detail & Related papers (2021-10-07T13:38:57Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Learning the Pareto Front with Hypernetworks [44.72371822514582]
Multi-objective optimization (MOO) problems are prevalent in machine learning.
These problems have a set of optimal solutions, where each point on the front represents a different trade-off between possibly conflicting objectives.
Recent MOO methods can target a specific desired ray in loss space however, most approaches still face two grave limitations.
arXiv Detail & Related papers (2020-10-08T16:39:20Z) - Efficient and Sparse Neural Networks by Pruning Weights in a
Multiobjective Learning Approach [0.0]
We propose a multiobjective perspective on the training of neural networks by treating its prediction accuracy and the network complexity as two individual objective functions.
Preliminary numerical results on exemplary convolutional neural networks confirm that large reductions in the complexity of neural networks with neglibile loss of accuracy are possible.
arXiv Detail & Related papers (2020-08-31T13:28:03Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.