Related papers: Mixture of experts models for multilevel data: modelling framework and approximation theory

Mixture of experts models for multilevel data: modelling framework and approximation theory

URL: http://arxiv.org/abs/2209.15207v1
Date: Fri, 30 Sep 2022 03:29:32 GMT
Title: Mixture of experts models for multilevel data: modelling framework and approximation theory
Authors: Tsz Chai Fung, Spark C. Tseung
Abstract summary: We study a class of mixed MoE (MMoE) models for multilevel data. The MMoE has a potential to accurately resemble almost all characteristics inherited in multilevel data. A nested version of the MMoE universally approximates a broad range of dependence structures of the random effects among different factor levels.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multilevel data are prevalent in many real-world applications. However, it remains an open research problem to identify and justify a class of models that flexibly capture a wide range of multilevel data. Motivated by the versatility of the mixture of experts (MoE) models in fitting regression data, in this article we extend upon the MoE and study a class of mixed MoE (MMoE) models for multilevel data. Under some regularity conditions, we prove that the MMoE is dense in the space of any continuous mixed effects models in the sense of weak convergence. As a result, the MMoE has a potential to accurately resemble almost all characteristics inherited in multilevel data, including the marginal distributions, dependence structures, regression links, random intercepts and random slopes. In a particular case where the multilevel data is hierarchical, we further show that a nested version of the MMoE universally approximates a broad range of dependence structures of the random effects among different factor levels.

Related papers

CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [54.85000884785013]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z)
Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed Environments [8.494154839146622]
Federated Learning (FL) is a decentralized machine learning paradigm that enables clients to collaboratively train models while preserving data privacy.<n>We propose Mosaic, a novel data-free knowledge distillation framework tailored for heterogeneous distributed environments.<n>Mosaic forms a Mixture-of-Experts (MoE) from client models based on their specialized knowledge, and distills it into a global model using the generated data.
arXiv Detail & Related papers (2025-05-26T08:52:49Z)
Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals [3.669506968635671]
Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution.<n>In this paper, we combine MCTM with state-of-the-art and autoregressive NF to leverage the transparency of MCTM for modeling interpretable feature effects.<n>We demonstrate our method's versatility in various numerical experiments and compare it with MCTM and other NF models on both simulated and real-world data.
arXiv Detail & Related papers (2025-05-20T10:17:07Z)
Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels [64.94853276821992]
Large multimodal models (LMMs) are increasingly deployed across diverse applications. Traditional evaluation methods are largely dataset-centric, relying on fixed, labeled datasets and supervised metrics. We explore unsupervised model ranking for LMMs by leveraging their uncertainty signals, such as softmax probabilities.
arXiv Detail & Related papers (2024-12-09T13:05:43Z)
MMBind: Unleashing the Potential of Distributed and Heterogeneous Data for Multimodal Learning in IoT [11.884646027921173]
We propose MMBind, a new framework for multimodal learning on distributed and heterogeneous IoT data. We demonstrate that data of different modalities observing similar events, even captured at different times and locations, can be effectively used for multimodal training.
arXiv Detail & Related papers (2024-11-18T23:34:07Z)
Supervised Multi-Modal Fission Learning [19.396207029419813]
Learning from multimodal datasets can leverage complementary information and improve performance in prediction tasks. We propose a Multi-Modal Fission Learning model that simultaneously identifies globally joint, partially joint, and individual components.
arXiv Detail & Related papers (2024-09-30T17:58:03Z)
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z)
On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model. We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions. Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z)
Learning Hierarchical Features with Joint Latent Space Energy-Based Prior [44.4434704520236]
We study the fundamental problem of multi-layer generator models in learning hierarchical representations. We propose a joint latent space EBM prior model with multi-layer latent variables for effective hierarchical representation learning.
arXiv Detail & Related papers (2023-10-14T15:44:14Z)
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives [5.549794481031468]
Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. In this work, we consider a variational objective that can tightly approximate the data log-likelihood. We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches.
arXiv Detail & Related papers (2023-09-01T10:32:21Z)
Learning Joint Latent Space EBM Prior Model for Multi-layer Generator [44.4434704520236]
We study the fundamental problem of learning multi-layer generator models. We propose an energy-based model (EBM) on the joint latent space over all layers of latent variables. Our experiments demonstrate that the learned model can be expressive in generating high-quality images.
arXiv Detail & Related papers (2023-06-10T00:27:37Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models. We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data. We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously. We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework. The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.