MoCaE: Mixture of Calibrated Experts Significantly Improves Object
Detection
- URL: http://arxiv.org/abs/2309.14976v4
- Date: Thu, 1 Feb 2024 12:51:34 GMT
- Title: MoCaE: Mixture of Calibrated Experts Significantly Improves Object
Detection
- Authors: Kemal Oksuz and Selim Kuzucu and Tom Joy and Puneet K. Dokania
- Abstract summary: We find that na"ively combining expert object detectors in a similar way to Deep Ensembles, can often lead to degraded performance.
We identify that the primary cause of this issue is that the predictions of the experts do not match their performance.
To address this, when constructing the Mixture of Experts, we propose to combine their predictions in a manner which reflects the individual performance of the experts.
- Score: 18.059899772411033
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Combining the strengths of many existing predictors to obtain a Mixture of
Experts which is superior to its individual components is an effective way to
improve the performance without having to develop new architectures or train a
model from scratch. However, surprisingly, we find that na\"ively combining
expert object detectors in a similar way to Deep Ensembles, can often lead to
degraded performance. We identify that the primary cause of this issue is that
the predictions of the experts do not match their performance, a term referred
to as miscalibration. Consequently, the most confident detector dominates the
final predictions, preventing the mixture from leveraging all the predictions
from the experts appropriately. To address this, when constructing the Mixture
of Experts, we propose to combine their predictions in a manner which reflects
the individual performance of the experts; an objective we achieve by first
calibrating the predictions before filtering and refining them. We term this
approach the Mixture of Calibrated Experts and demonstrate its effectiveness
through extensive experiments on 5 different detection tasks using a variety of
detectors, showing that it: (i) improves object detectors on COCO and instance
segmentation methods on LVIS by up to $\sim 2.5$ AP; (ii) reaches
state-of-the-art on COCO test-dev with $65.1$ AP and on DOTA with $82.62$
$\mathrm{AP_{50}}$; (iii) outperforms single models consistently on recent
detection tasks such as Open Vocabulary Object Detection.
Related papers
- Unveiling Hidden Collaboration within Mixture-of-Experts in Large Language Models [5.211806751260724]
We propose a hierarchical sparse dictionary learning (HSDL) method that uncovers the collaboration patterns among experts.
We also introduce the Contribution-Aware Expert Pruning (CAEP) algorithm, which effectively prunes low-contribution experts.
arXiv Detail & Related papers (2025-04-16T04:06:15Z) - Epistemic Uncertainty-aware Recommendation Systems via Bayesian Deep Ensemble Learning [2.3310092106321365]
We propose an ensemble-based supermodel to generate more robust and reliable predictions.
We also introduce a new interpretable non-linear matching approach for the user and item embeddings.
arXiv Detail & Related papers (2025-04-14T23:04:35Z) - Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations [48.890534958441016]
We investigate domain specialization and expert redundancy in large-scale MoE models.
We propose a simple yet effective pruning framework, EASY-EP, to identify and retain only the most relevant experts.
Our method can achieve comparable performances and $2.99times$ throughput under the same memory budget with full DeepSeek-R1 with only half the experts.
arXiv Detail & Related papers (2025-04-09T11:34:06Z) - Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations [86.90549830760513]
Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks.
We propose MoE Experts Compression Suite (MC-Suite) to provide a benchmark for estimating expert importance from diverse perspectives.
We present an experimentally validated conjecture that, during expert dropping, SMoEs' instruction-following capabilities are predominantly hurt.
arXiv Detail & Related papers (2025-04-08T00:49:08Z) - Convergence Rates for Softmax Gating Mixture of Experts [78.3687645289918]
Mixture of experts (MoE) has emerged as an effective framework to advance the efficiency and scalability of machine learning models.
Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights.
We perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating.
arXiv Detail & Related papers (2025-03-05T06:11:24Z) - Leveraging Mixture of Experts for Improved Speech Deepfake Detection [53.69740463004446]
Speech deepfakes pose a significant threat to personal security and content authenticity.
We introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture.
arXiv Detail & Related papers (2024-09-24T13:24:03Z) - UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection [18.25576487115016]
This paper focuses on Human-Object Interaction (HOI) detection.
It addresses the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame.
We propose a novel approach textscUAHOI, Uncertainty-aware Robust Human-Object Interaction Learning.
arXiv Detail & Related papers (2024-08-14T10:06:39Z) - Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study [65.11303133775857]
Mixture-of-Experts (MoE) computation amalgamates predictions from several specialized sub-models (referred to as experts)
Sparse MoE selectively engages only a limited number, or even just one expert, significantly reducing overhead while empirically preserving, and sometimes even enhancing, performance.
arXiv Detail & Related papers (2024-03-26T05:48:02Z) - Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization [51.98792406392873]
Mixture of Experts (MoE) provides a powerful way to decompose dense layers into smaller, modular computations.
A major challenge lies in the computational cost of scaling the number of experts high enough to achieve fine-grained specialization.
We propose the Multilinear Mixture of Experts ($mu$MoE) layer to address this, focusing on vision models.
arXiv Detail & Related papers (2024-02-19T21:20:22Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - Incorporating Experts' Judgment into Machine Learning Models [2.5363839239628843]
In some cases, domain experts might have a judgment about the expected outcome that might conflict with the prediction of machine learning models.
We present a novel framework that aims at leveraging experts' judgment to mitigate the conflict.
arXiv Detail & Related papers (2023-04-24T07:32:49Z) - A Review of Uncertainty Calibration in Pretrained Object Detectors [5.440028715314566]
We investigate the uncertainty calibration properties of different pretrained object detection architectures in a multi-class setting.
We propose a framework to ensure a fair, unbiased, and repeatable evaluation.
We deliver novel insights into why poor detector calibration emerges.
arXiv Detail & Related papers (2022-10-06T14:06:36Z) - ERA: Expert Retrieval and Assembly for Early Action Prediction [13.721609856985376]
Early action prediction aims to successfully predict the class label of an action before it is completely performed.
This is a challenging task because the beginning stages of different actions can be very similar.
We propose a novel Expert Retrieval and Assembly (ERA) module that retrieves and assembles a set of experts specialized at using subtle differences.
arXiv Detail & Related papers (2022-07-20T06:09:26Z) - Trustworthy Long-Tailed Classification [41.45744960383575]
We propose a Trustworthy Long-tailed Classification (TLC) method to jointly conduct classification and uncertainty estimation.
Our TLC obtains the evidence-based uncertainty (EvU) and evidence for each expert, and then combines these uncertainties and evidences under the Dempster-Shafer Evidence Theory (DST)
The experimental results show that the proposed TLC outperforms the state-of-the-art methods and is trustworthy with reliable uncertainty.
arXiv Detail & Related papers (2021-11-17T10:52:36Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z) - Probabilistic Ranking-Aware Ensembles for Enhanced Object Detections [50.096540945099704]
We propose a novel ensemble called the Probabilistic Ranking Aware Ensemble (PRAE) that refines the confidence of bounding boxes from detectors.
We also introduce a bandit approach to address the confidence imbalance problem caused by the need to deal with different numbers of boxes.
arXiv Detail & Related papers (2021-05-07T09:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.