Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks
- URL: http://arxiv.org/abs/2005.02426v2
- Date: Thu, 8 Oct 2020 06:54:36 GMT
- Title: Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks
- Authors: Zhishuai Guo, Mingrui Liu, Zhuoning Yuan, Li Shen, Wei Liu, Tianbao
Yang
- Abstract summary: We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
- Score: 50.42141893913188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study distributed algorithms for large-scale AUC
maximization with a deep neural network as a predictive model. Although
distributed learning techniques have been investigated extensively in deep
learning, they are not directly applicable to stochastic AUC maximization with
deep neural networks due to its striking differences from standard loss
minimization problems (e.g., cross-entropy). Towards addressing this challenge,
we propose and analyze a communication-efficient distributed optimization
algorithm based on a {\it non-convex concave} reformulation of the AUC
maximization, in which the communication of both the primal variable and the
dual variable between each worker and the parameter server only occurs after
multiple steps of gradient-based updates in each worker. Compared with the
naive parallel version of an existing algorithm that computes stochastic
gradients at individual machines and averages them for updating the model
parameters, our algorithm requires a much less number of communication rounds
and still achieves a linear speedup in theory. To the best of our knowledge,
this is the \textbf{first} work that solves the {\it non-convex concave
min-max} problem for AUC maximization with deep neural networks in a
communication-efficient distributed manner while still maintaining the linear
speedup property in theory. Our experiments on several benchmark datasets show
the effectiveness of our algorithm and also confirm our theory.
Related papers
- Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Partitioning sparse deep neural networks for scalable training and
inference [8.282177703075453]
State-of-the-art deep neural networks (DNNs) have significant computational and data management requirements.
Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs.
The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning.
arXiv Detail & Related papers (2021-04-23T20:05:52Z) - A Dynamical View on Optimization Algorithms of Overparameterized Neural
Networks [23.038631072178735]
We consider a broad class of optimization algorithms that are commonly used in practice.
As a consequence, we can leverage the convergence behavior of neural networks.
We believe our approach can also be extended to other optimization algorithms and network theory.
arXiv Detail & Related papers (2020-10-25T17:10:22Z) - Distributed Optimization, Averaging via ADMM, and Network Topology [0.0]
We study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence.
arXiv Detail & Related papers (2020-09-05T21:44:39Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - Hyperspectral Unmixing Network Inspired by Unfolding an Optimization
Problem [2.4016406737205753]
The hyperspectral image (HSI) unmixing task is essentially an inverse problem, which is commonly solved by optimization algorithms.
We propose two novel network architectures, named U-ADMM-AENet and U-ADMM-BUNet, for abundance estimation and blind unmixing.
We show that the unfolded structures can find corresponding interpretations in machine learning literature, which further demonstrates the effectiveness of proposed methods.
arXiv Detail & Related papers (2020-05-21T18:49:45Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z) - Channel Assignment in Uplink Wireless Communication using Machine
Learning Approach [54.012791474906514]
This letter investigates a channel assignment problem in uplink wireless communication systems.
Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints.
Due to high computational complexity, machine learning approaches are employed to obtain computational efficient solutions.
arXiv Detail & Related papers (2020-01-12T15:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.