Related papers: Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection

Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection

URL: http://arxiv.org/abs/2507.17436v1
Date: Wed, 23 Jul 2025 11:51:06 GMT
Title: Dynamic-DINO: Fine-Grained Mixture of Experts Tuning for Real-time Open-Vocabulary Object Detection
Authors: Yehao Lu, Minghe Weng, Zekang Xiao, Rui Jiang, Wei Su, Guangcong Zheng, Ping Lu, Xi Li,
Abstract summary: We propose Dynamic-DINO, which extends Grounding DINO 1.5 Edge from a dense model to a dynamic inference framework via an efficient MoE-Tuning strategy.<n>We also design a decomposition mechanism to decompose the Feed-Forward Network (FFN) of base model into multiple smaller expert networks.<n>Experiments show that, pretrained with merely 1.56M open-source data, Dynamic-DINO outperforms Grounding DINO 1.5 Edge, pretrained on the private Grounding20M dataset.
Score: 10.639484582036088
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Mixture of Experts (MoE) architecture has excelled in Large Vision-Language Models (LVLMs), yet its potential in real-time open-vocabulary object detectors, which also leverage large-scale vision-language datasets but smaller models, remains unexplored. This work investigates this domain, revealing intriguing insights. In the shallow layers, experts tend to cooperate with diverse peers to expand the search space. While in the deeper layers, fixed collaborative structures emerge, where each expert maintains 2-3 fixed partners and distinct expert combinations are specialized in processing specific patterns. Concretely, we propose Dynamic-DINO, which extends Grounding DINO 1.5 Edge from a dense model to a dynamic inference framework via an efficient MoE-Tuning strategy. Additionally, we design a granularity decomposition mechanism to decompose the Feed-Forward Network (FFN) of base model into multiple smaller expert networks, expanding the subnet search space. To prevent performance degradation at the start of fine-tuning, we further propose a pre-trained weight allocation strategy for the experts, coupled with a specific router initialization. During inference, only the input-relevant experts are activated to form a compact subnet. Experiments show that, pretrained with merely 1.56M open-source data, Dynamic-DINO outperforms Grounding DINO 1.5 Edge, pretrained on the private Grounding20M dataset.

Related papers

Enhanced DeepONet for 1-D consolidation operator learning: an architectural investigation [1.1743167854433305]
Deep Operator Networks (DeepONets) have emerged as a powerful surrogate modeling framework for learning solution operators in PDE-governed systems.<n>This study systematically evaluates several DeepONet architectures for the one-dimensional consolidation problem.
arXiv Detail & Related papers (2025-07-14T15:09:58Z)
MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization [0.0]
MoE-MLoRA is a mixture-of-experts framework where each expert is first trained independently to specialize in its domain.<n>We evaluate MoE-MLoRA across eight CTR models on Movielens and Taobao.
arXiv Detail & Related papers (2025-06-09T09:03:05Z)
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating [75.29576838162714]
DeepSeekMoE stands out because of two unique features: the deployment of a shared expert strategy and of the normalized sigmoid gating mechanism.<n>We perform a convergence analysis of the expert estimation task to highlight the gains in sample efficiency for both the shared expert strategy and the normalized sigmoid gating.
arXiv Detail & Related papers (2025-05-16T04:58:18Z)
Domain-Specific Pruning of Large Mixture-of-Experts Models with Few-shot Demonstrations [48.890534958441016]
This study investigates domain specialization and expert redundancy in large-scale MoE models.<n>We propose a simple yet effective pruning framework, EASY-EP, to identify and retain only the most relevant experts.<n>Experiments on DeepSeek-R1 and DeepSeek-V3-0324 show that our method can achieve comparable performances and $2.99times$ throughput under the same memory budget with full model with only half the experts.
arXiv Detail & Related papers (2025-04-09T11:34:06Z)
Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts [82.74439280067492]
Finedeep is a deep-layered fine-grained expert architecture for dense models.<n>Our framework partitions the feed-forward neural network layers of traditional dense models into small experts.<n>A novel routing mechanism is proposed to determine each expert's contribution.
arXiv Detail & Related papers (2025-02-18T15:09:58Z)
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts [49.394145046409044]
This paper provides the first provably efficient technique for pruning experts in finetuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the routers l2 norm from the pretrained model guarantees the preservation of test accuracy. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models.
arXiv Detail & Related papers (2024-05-26T17:52:58Z)
Ensemble and Mixture-of-Experts DeepONets For Operator Learning [4.604003661048267]
We present a novel deep operator network (DeepONet) architecture for operator learning.<n>The ensemble DeepONet allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks.<n>We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture.
arXiv Detail & Related papers (2024-05-20T09:42:44Z)
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training [73.90260246781435]
We present Lory, the first approach that scales such architectures to autoregressive language model pre-training. We show significant performance gains over parameter-matched dense models on both perplexity and a variety of downstream tasks. Despite segment-level routing, Lory models achieve competitive performance compared to state-of-the-art MoE models with token-level routing.
arXiv Detail & Related papers (2024-05-06T03:06:33Z)
Harder Tasks Need More Experts: Dynamic Routing in MoE Models [58.18526590138739]
We introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models. Our method dynamically selects experts based on the confidence level in expert selection for each input.
arXiv Detail & Related papers (2024-03-12T13:41:15Z)
Improving Expert Specialization in Mixture of Experts [0.7366405857677227]
Mixture of experts (MoE) is the simplest gated modular neural network architecture. We show that the original MoE architecture and its training method do not guarantee intuitive task decompositions and good expert utilization. We introduce a novel gating architecture, similar to attention, that improves performance and results in a lower entropy task decomposition.
arXiv Detail & Related papers (2023-02-28T16:16:45Z)
Diversified Dynamic Routing for Vision Tasks [36.199659460868496]
We propose a novel architecture where each layer is composed of a set of experts. In our method, the model is explicitly trained to solve the challenge of finding relevant partitioning of the data. We conduct several experiments on semantic segmentation on Cityscapes and object detection and instance segmentation on MS-COCO.
arXiv Detail & Related papers (2022-09-26T23:27:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.