Related papers: Robust Hybrid Learning With Expert Augmentation

Related papers

HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and Generation [72.69742127579508]
Recent unified models integrate understanding experts (e.g., LLMs) with generative experts (e.g., diffusion models)<n>In this work, we propose HBridge, an asymmetric H-shaped architecture that enables heterogeneous experts to optimally leverage pretrained priors.<n> Extensive experiments across multiple benchmarks demonstrate the effectiveness and superior performance of HBridge.
arXiv Detail & Related papers (2025-11-25T17:23:38Z)
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization [60.309915093470416]
Matryoshka MoE (M-MoE) is a training framework that instills a coarse-to-fine structure directly into the expert ensemble.<n>Our work paves the way for more practical and adaptable deployments of large-scale MoE models.
arXiv Detail & Related papers (2025-09-30T16:56:44Z)
A Unified Virtual Mixture-of-Experts Framework:Enhanced Inference and Hallucination Mitigation in Single-Model System [9.764336669208394]
Generative models, such as GPT and BERT, have significantly improved performance in tasks like text generation and summarization. However, hallucinations "where models generate non-factual or misleading content" are especially problematic in smaller-scale architectures. We propose a unified Virtual Mixture-of-Experts (MoE) fusion strategy that enhances inference performance and mitigates hallucinations in a single Qwen 1.5 0.5B model.
arXiv Detail & Related papers (2025-04-01T11:38:01Z)
Learning Mixtures of Experts with EM: A Mirror Descent Perspective [28.48469221248906]
Classical Mixtures of Experts (MoE) are Machine Learning models that involve the input space, with a separate "expert" model trained on each partition.<n>We study theoretical guarantees of the Expectation Maximization (EM) algorithm for the training of MoE models.
arXiv Detail & Related papers (2024-11-09T03:44:09Z)
Automatically Learning Hybrid Digital Twins of Dynamical Systems [56.69628749813084]
Digital Twins (DTs) simulate the states and temporal dynamics of real-world systems. DTs often struggle to generalize to unseen conditions in data-scarce settings. In this paper, we propose an evolutionary algorithm ($textbfHDTwinGen$) to autonomously propose, evaluate, and optimize HDTwins.
arXiv Detail & Related papers (2024-10-31T07:28:22Z)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z)
Hybrid Quantum-inspired Resnet and Densenet for Pattern Recognition [1.0499611180329804]
We propose two hybrid quantum-inspired neural networks with residual and dense connections respectively for pattern recognition. A group of numerical experiments about generalization power shows that our hybrid models possess the same generalization power as the pure classical models. A study indicate that the recognition accuracy of our hybrid models is 2%-3% higher than that of the quantum neural network without residual or dense connection.
arXiv Detail & Related papers (2024-03-09T01:34:26Z)
Causal hybrid modeling with double machine learning [4.190790144182304]
Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing Double Machine Learning (DML) to estimate causal effects. We demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network (DNN) approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality.
arXiv Detail & Related papers (2024-02-20T19:19:56Z)
Augment on Manifold: Mixup Regularization with UMAP [5.18337967156149]
This paper proposes a Mixup regularization scheme, referred to as UMAP Mixup, for automated data augmentation for deep learning predictive models. The proposed approach ensures that the Mixup operations result in synthesized samples that lie on the data manifold of the features and labels.
arXiv Detail & Related papers (2023-12-20T16:02:25Z)
Hybrid additive modeling with partial dependence for supervised regression and dynamical systems forecasting [5.611231523622238]
We introduce a new hybrid training approach based on partial dependence, which removes the need for intricate regularization. We compare, on both synthetic and real regression problems, several approaches for training such hybrid models. Experiments are carried out with different types of machine learning models, including tree-based models and artificial neural networks.
arXiv Detail & Related papers (2023-07-05T12:13:56Z)
Prediction Sets for High-Dimensional Mixture of Experts Models [9.195729979000404]
We show how to construct valid prediction sets for an $ell_$-penalized mixture of experts model in the high-dimensional setting. We make use of a debiasing procedure to account for the bias induced by the penalization and propose a novel strategy for combining intervals to form a prediction set.
arXiv Detail & Related papers (2022-10-30T00:27:19Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Conformer-based Hybrid ASR System for Switchboard Dataset [99.88988282353206]
We present and evaluate a competitive conformer-based hybrid model training recipe. We study different training aspects and methods to improve word-error-rate as well as to increase training speed. We conduct experiments on Switchboard 300h dataset and our conformer-based hybrid model achieves competitive results.
arXiv Detail & Related papers (2021-11-05T12:03:18Z)
A Variational Infinite Mixture for Probabilistic Inverse Dynamics Learning [34.90240171916858]
We develop an efficient variational Bayes inference technique for infinite mixtures of probabilistic local models. We highlight the model's power in combining data-driven adaptation, fast prediction and the ability to deal with discontinuous functions and heteroscedastic noise. We use the learned models for online dynamics control of a Barrett-WAM manipulator, significantly improving the trajectory tracking performance.
arXiv Detail & Related papers (2020-11-10T16:15:13Z)
Training Deep Energy-Based Models with f-Divergence Minimization [113.97274898282343]
Deep energy-based models (EBMs) are very flexible in distribution parametrization but computationally challenging. We propose a general variational framework termed f-EBM to train EBMs using any desired f-divergence. Experimental results demonstrate the superiority of f-EBM over contrastive divergence, as well as the benefits of training EBMs using f-divergences other than KL.
arXiv Detail & Related papers (2020-03-06T23:11:13Z)
Hybrid modeling: Applications in real-time diagnosis [64.5040763067757]
We outline a novel hybrid modeling approach that combines machine learning inspired models and physics-based models. We are using such models for real-time diagnosis applications.
arXiv Detail & Related papers (2020-03-04T00:44:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.