Scalable Optimal Margin Distribution Machine
- URL: http://arxiv.org/abs/2305.04837v4
- Date: Sun, 11 Jun 2023 05:55:54 GMT
- Title: Scalable Optimal Margin Distribution Machine
- Authors: Yilin Wang, Nan Cao, Teng Zhang, Xuanhua Shi and Hai Jin
- Abstract summary: Optimal margin Distribution Machine (ODM) is a newly proposed statistical learning framework rooting in the novel margin theory.
This paper proposes a scalable ODM, which can achieve nearly ten times speedup compared to the original ODM training method.
- Score: 50.281535710689795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimal margin Distribution Machine (ODM) is a newly proposed statistical
learning framework rooting in the novel margin theory, which demonstrates
better generalization performance than the traditional large margin based
counterparts. Nonetheless, it suffers from the ubiquitous scalability problem
regarding both computation time and memory as other kernel methods. This paper
proposes a scalable ODM, which can achieve nearly ten times speedup compared to
the original ODM training method. For nonlinear kernels, we propose a novel
distribution-aware partition method to make the local ODM trained on each
partition be close and converge fast to the global one. When linear kernel is
applied, we extend a communication efficient SVRG method to accelerate the
training further. Extensive empirical studies validate that our proposed method
is highly computational efficient and almost never worsen the generalization.
Related papers
- Blind Super-Resolution via Meta-learning and Markov Chain Monte Carlo Simulation [46.5310645609264]
We propose a Meta-learning and Markov Chain Monte Carlo based SISR approach to learn kernel priors from organized randomness.
A lightweight network is adopted as kernel generator, and is optimized via learning from the MCMC simulation on random Gaussian distributions.
A meta-learning-based alternating optimization procedure is proposed to optimize the kernel generator and image restorer.
arXiv Detail & Related papers (2024-06-13T07:50:15Z) - Local Methods with Adaptivity via Scaling [71.11111992280566]
This paper aims to merge the local training technique with the adaptive approach to develop efficient distributed learning methods.
We consider the classical Local SGD method and enhance it with a scaling feature.
In addition to theoretical analysis, we validate the performance of our methods in practice by training a neural network.
arXiv Detail & Related papers (2024-06-02T19:50:05Z) - Sparsity-Aware Distributed Learning for Gaussian Processes with Linear
Multiple Kernel [22.23550794664218]
This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework.
The framework incorporates a quantized alternating direction method of multipliers (ADMM) for collaborative learning among multiple agents.
Experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.
arXiv Detail & Related papers (2023-09-15T07:05:33Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Distributionally Robust Federated Averaging [19.875176871167966]
We present communication efficient distributed algorithms for robust learning periodic averaging with adaptive sampling.
We give corroborating experimental evidence for our theoretical results in federated learning settings.
arXiv Detail & Related papers (2021-02-25T03:32:09Z) - Distributed Optimization, Averaging via ADMM, and Network Topology [0.0]
We study the connection between network topology and convergence rates for different algorithms on a real world problem of sensor localization.
We also show interesting connections between ADMM and lifted Markov chains besides providing an explicitly characterization of its convergence.
arXiv Detail & Related papers (2020-09-05T21:44:39Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.