Message Passing Descent for Efficient Machine Learning
- URL: http://arxiv.org/abs/2102.08110v1
- Date: Tue, 16 Feb 2021 12:22:54 GMT
- Title: Message Passing Descent for Efficient Machine Learning
- Authors: Francesco Concetti, Michael Chertkov
- Abstract summary: We propose a new iterative optimization method for the bf Data-Fitting (DF) problem in Machine Learning.
The approach relies on bf Graphical Model representation of the DF problem.
We suggest the bf Message Passage Descent algorithm which relies on the piece-wise-polynomial representation of the model DF function.
- Score: 4.416484585765027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new iterative optimization method for the {\bf Data-Fitting}
(DF) problem in Machine Learning, e.g. Neural Network (NN) training. The
approach relies on {\bf Graphical Model} (GM) representation of the DF problem,
where variables are fitting parameters and factors are associated with the
Input-Output (IO) data. The GM results in the {\bf Belief Propagation}
Equations considered in the {\bf Large Deviation Limit} corresponding to the
practically important case when the number of the IO samples is much larger
than the number of the fitting parameters. We suggest the {\bf Message Passage
Descent} algorithm which relies on the piece-wise-polynomial representation of
the model DF function. In contrast with the popular gradient descent and
related algorithms our MPD algorithm rely on analytic (not automatic)
differentiation, while also (and most importantly) it descents through the
rugged DF landscape by \emph{making non local updates of the parameters} at
each iteration. The non-locality guarantees that the MPD is not trapped in the
local-minima, therefore resulting in better performance than locally-updated
algorithms of the gradient-descent type. We illustrate superior performance of
the algorithm on a Feed-Forward NN with a single hidden layer and a
piece-wise-linear activation function.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - A Nonoverlapping Domain Decomposition Method for Extreme Learning Machines: Elliptic Problems [0.0]
Extreme learning machine (ELM) is a methodology for solving partial differential equations (PDEs) using a single hidden layer feed-forward neural network.
In this paper, we propose a nonoverlapping domain decomposition method (DDM) for ELMs that not only reduces the training time of ELMs, but is also suitable for parallel computation.
arXiv Detail & Related papers (2024-06-22T23:25:54Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Graph Signal Restoration Using Nested Deep Algorithm Unrolling [85.53158261016331]
Graph signal processing is a ubiquitous task in many applications such as sensor, social transportation brain networks, point cloud processing, and graph networks.
We propose two restoration methods based on convexindependent deep ADMM (ADMM)
parameters in the proposed restoration methods are trainable in an end-to-end manner.
arXiv Detail & Related papers (2021-06-30T08:57:01Z) - Sliced Iterative Normalizing Flows [7.6146285961466]
We develop an iterative (greedy) deep learning (DL) algorithm which is able to transform an arbitrary probability distribution function (PDF) into the target PDF.
As special cases of this algorithm, we introduce two sliced iterative Normalizing Flow (SINF) models, which map from the data to the latent space (GIS) and vice versa.
arXiv Detail & Related papers (2020-07-01T18:00:04Z) - A Neural Network Approach for Online Nonlinear Neyman-Pearson
Classification [3.6144103736375857]
We propose a novel Neyman-Pearson (NP) classifier that is both online and nonlinear as the first time in the literature.
The proposed classifier operates on a binary labeled data stream in an online manner, and maximizes the detection power about a user-specified and controllable false positive rate.
Our algorithm is appropriate for large scale data applications and provides a decent false positive rate controllability with real time processing.
arXiv Detail & Related papers (2020-06-14T20:00:25Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.