Big Learning Expectation Maximization
- URL: http://arxiv.org/abs/2312.11926v1
- Date: Tue, 19 Dec 2023 08:07:41 GMT
- Title: Big Learning Expectation Maximization
- Authors: Yulai Cong, Sijia Li
- Abstract summary: We present the Big Learning EM (BigLearn-EM), an EM upgrade that simultaneously performs joint, marginal, and orthogonally transformed marginal matchings.
We empirically show that the BigLearn-EM is capable of delivering the optimal with high probability.
- Score: 13.709094150105566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixture models serve as one fundamental tool with versatile applications.
However, their training techniques, like the popular Expectation Maximization
(EM) algorithm, are notoriously sensitive to parameter initialization and often
suffer from bad local optima that could be arbitrarily worse than the optimal.
To address the long-lasting bad-local-optima challenge, we draw inspiration
from the recent ground-breaking foundation models and propose to leverage their
underlying big learning principle to upgrade the EM. Specifically, we present
the Big Learning EM (BigLearn-EM), an EM upgrade that simultaneously performs
joint, marginal, and orthogonally transformed marginal matchings between data
and model distributions. Through simulated experiments, we empirically show
that the BigLearn-EM is capable of delivering the optimal with high
probability; comparisons on benchmark clustering datasets further demonstrate
its effectiveness and advantages over existing techniques. The code is
available at
https://github.com/YulaiCong/Big-Learning-Expectation-Maximization.
Related papers
- Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback [64.67540769692074]
Large language models (LLMs) fine-tuned with alignment techniques, such as reinforcement learning from human feedback, have been instrumental in developing some of the most capable AI systems to date.
We introduce an approach called Margin Matching Preference Optimization (MMPO), which incorporates relative quality margins into optimization, leading to improved LLM policies and reward models.
Experiments with both human and AI feedback data demonstrate that MMPO consistently outperforms baseline methods, often by a substantial margin, on popular benchmarks including MT-bench and RewardBench.
arXiv Detail & Related papers (2024-10-04T04:56:11Z) - Delegating Data Collection in Decentralized Machine Learning [67.0537668772372]
Motivated by the emergence of decentralized machine learning (ML) ecosystems, we study the delegation of data collection.
We design optimal and near-optimal contracts that deal with two fundamental information asymmetries.
We show that a principal can cope with such asymmetry via simple linear contracts that achieve 1-1/e fraction of the optimal utility.
arXiv Detail & Related papers (2023-09-04T22:16:35Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Stochastic First-Order Learning for Large-Scale Flexibly Tied Gaussian
Mixture Model [3.4546761246181696]
We propose a new optimization algorithm on the manifold of Gaussian Mixture Models (GMMs)
We observe that methods can outperform the expectation-maximization algorithm in terms of attaining better likelihood, needing fewer epochs for convergence, and consuming less time per each epoch.
arXiv Detail & Related papers (2022-12-11T04:24:52Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Model-based Offline Imitation Learning with Non-expert Data [7.615595533111191]
We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies.
We show that the proposed method textitalways outperforms Behavioral Cloning in the low data regime on simulated continuous control domains.
arXiv Detail & Related papers (2022-06-11T13:08:08Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Meta Hamiltonian Learning [0.0]
We use a machine learning technique known as meta-learning to learn a more efficient drifting for this task.
We observe that the meta-optimizer outperforms other optimization methods in average loss over test samples.
arXiv Detail & Related papers (2021-04-09T16:01:34Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Expected Information Maximization: Using the I-Projection for Mixture
Density Estimation [22.096148237257644]
Modelling highly multi-modal data is a challenging problem in machine learning.
We present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection.
We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches.
arXiv Detail & Related papers (2020-01-23T17:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.