Related papers: Mixture of Raytraced Experts

Mixture of Raytraced Experts

URL: http://arxiv.org/abs/2507.12419v1
Date: Wed, 16 Jul 2025 17:08:46 GMT
Title: Mixture of Raytraced Experts
Authors: Andrea Perin, Giacomo Lagomarsini, Claudio Gallicchio, Giuseppe Nuti,
Abstract summary: We introduce a stacked Mixture of Experts architecture which can dynamically select sequences of experts.<n>We train our model by iteratively sampling from a set of candidate experts, unfolding the sequence akin to how Recurrent Neural Networks are trained.<n>Preliminary experiments show a reduction in training epochs of 10% to 40% with a comparable/higher accuracy.
Score: 4.059745493584863
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce a Mixture of Raytraced Experts, a stacked Mixture of Experts (MoE) architecture which can dynamically select sequences of experts, producing computational graphs of variable width and depth. Existing MoE architectures generally require a fixed amount of computation for a given sample. Our approach, in contrast, yields predictions with increasing accuracy as the computation cycles through the experts' sequence. We train our model by iteratively sampling from a set of candidate experts, unfolding the sequence akin to how Recurrent Neural Networks are trained. Our method does not require load-balancing mechanisms, and preliminary experiments show a reduction in training epochs of 10\% to 40\% with a comparable/higher accuracy. These results point to new research directions in the field of MoEs, allowing the design of potentially faster and more expressive models. The code is available at https://github.com/nutig/RayTracing

Related papers

Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts. We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z)
Iterative Sizing Field Prediction for Adaptive Mesh Generation From Expert Demonstrations [49.173541207550485]
Adaptive Meshing By Expert Reconstruction (AMBER) is an imitation learning problem. AMBER combines a graph neural network with an online data acquisition scheme to predict the projected sizing field of an expert mesh. We experimentally validate AMBER on 2D meshes and 3D meshes provided by a human expert, closely matching the provided demonstrations and outperforming a single-step CNN baseline.
arXiv Detail & Related papers (2024-06-20T10:01:22Z)
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts [49.394145046409044]
This paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models.
arXiv Detail & Related papers (2024-05-26T17:52:58Z)
AMEND: A Mixture of Experts Framework for Long-tailed Trajectory Prediction [6.724750970258851]
We propose a modular model-agnostic framework for trajectory prediction. Each expert is trained with a specialized skill with respect to a particular part of the data. To produce predictions, we utilise a router network that selects the best expert by generating relative confidence scores.
arXiv Detail & Related papers (2024-02-13T02:43:41Z)
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE) MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z)
On Least Square Estimation in Softmax Gating Mixture of Experts [78.3687645289918]
We investigate the performance of the least squares estimators (LSE) under a deterministic MoE model. We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions. Our findings have important practical implications for expert selection.
arXiv Detail & Related papers (2024-02-05T12:31:18Z)
Mixtures of Gaussian process experts based on kernel stick-breaking processes [0.6396288020763143]
We propose a new mixture model of Gaussian process experts based on kernel stick-breaking processes. Our model maintains the intuitive appeal yet improve the performance of the existing models. The model behaviour and improved predictive performance are demonstrated in experiments using six datasets.
arXiv Detail & Related papers (2023-04-26T21:23:01Z)
On the Representation Collapse of Sparse Mixture of Experts [102.83396489230375]
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse.
arXiv Detail & Related papers (2022-04-20T01:40:19Z)
A Partial Regularization Method for Network Compression [0.0]
We propose an approach of partial regularization rather than the original form of penalizing all parameters, which is said to be full regularization, to conduct model compression at a higher speed. Experimental results show that as we expected, the computational complexity is reduced by observing less running time in almost all situations. Surprisingly, it helps to improve some important metrics such as regression fitting results and classification accuracy in both training and test phases on multiple datasets.
arXiv Detail & Related papers (2020-09-03T00:38:27Z)
Fast Deep Mixtures of Gaussian Process Experts [0.6554326244334868]
Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context. In this article, we propose to design the gating network for selecting the experts from sparse GPs using a deep neural network (DNN) A fast one pass algorithm called Cluster-Classify-Regress ( CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly.
arXiv Detail & Related papers (2020-06-11T18:52:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.