Unbiased Gradient Estimation with Balanced Assignments for Mixtures of
Experts
- URL: http://arxiv.org/abs/2109.11817v1
- Date: Fri, 24 Sep 2021 09:02:12 GMT
- Title: Unbiased Gradient Estimation with Balanced Assignments for Mixtures of
Experts
- Authors: Wouter Kool, Chris J. Maddison and Andriy Mnih
- Abstract summary: Training large-scale mixture of experts models efficiently requires assigning datapoints in a batch to different experts, each with a limited capacity.
Recently proposed assignment procedures lack a probabilistic interpretation and use biased estimators for training.
We propose two unbiased estimators based on principled assignment procedures: one that skips datapoints which exceed expert capacity, and one that samples perfectly balanced assignments.
- Score: 32.43213645631101
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training large-scale mixture of experts models efficiently on modern hardware
requires assigning datapoints in a batch to different experts, each with a
limited capacity. Recently proposed assignment procedures lack a probabilistic
interpretation and use biased estimators for training. As an alternative, we
propose two unbiased estimators based on principled stochastic assignment
procedures: one that skips datapoints which exceed expert capacity, and one
that samples perfectly balanced assignments using an extension of the
Gumbel-Matching distribution [29]. Both estimators are unbiased, as they
correct for the used sampling procedure. On a toy experiment, we find the
`skip'-estimator is more effective than the balanced sampling one, and both are
more robust in solving the task than biased alternatives.
Related papers
- Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection [63.96018203905272]
We propose to reduce the sampling cost by pruning a pretrained diffusion model into a mixture of efficient experts.
We demonstrate the effectiveness of our method, DiffPruning, across several datasets.
arXiv Detail & Related papers (2024-09-23T21:27:26Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Doubly Calibrated Estimator for Recommendation on Data Missing Not At
Random [20.889464448762176]
We argue that existing estimators rely on miscalibrated imputed errors and propensity scores.
We propose a Doubly Calibrated Estimator that involves the calibration of both the imputation and propensity models.
arXiv Detail & Related papers (2024-02-26T05:08:52Z) - Debiased Sample Selection for Combating Noisy Labels [24.296451733127956]
We propose a noIse-Tolerant Expert Model (ITEM) for debiased learning in sample selection.
Specifically, to mitigate the training bias, we design a robust network architecture that integrates with multiple experts.
By training on the mixture of two class-discriminative mini-batches, the model mitigates the effect of the imbalanced training set.
arXiv Detail & Related papers (2024-01-24T10:37:28Z) - Twice Class Bias Correction for Imbalanced Semi-Supervised Learning [59.90429949214134]
We introduce a novel approach called textbfTwice textbfClass textbfBias textbfCorrection (textbfTCBC)
We estimate the class bias of the model parameters during the training process.
We apply a secondary correction to the model's pseudo-labels for unlabeled samples.
arXiv Detail & Related papers (2023-12-27T15:06:36Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - Balanced Product of Calibrated Experts for Long-Tailed Recognition [13.194151879344487]
Many real-world recognition problems are characterized by long-tailed label distributions.
In this work, we take an analytical approach and extend the notion of logit adjustment to ensembles to form a Balanced Product of Experts (BalPoE)
We show how to properly define these distributions and combine the experts in order to achieve unbiased predictions.
arXiv Detail & Related papers (2022-06-10T17:59:02Z) - Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse
Experts with Self-Supervision [85.07855130048951]
We study a more practical task setting, called test-agnostic long-tailed recognition, where the training class distribution is long-tailed.
We propose a new method, called Test-time Aggregating Diverse Experts (TADE), that trains diverse experts to excel at handling different test distributions.
We theoretically show that our method has provable ability to simulate unknown test class distributions.
arXiv Detail & Related papers (2021-07-20T04:10:31Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.