Related papers: Err

Related papers

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention [18.000894283686176]
Vision large language models (VLLMs) are focusing on handling complex and fine-grained visual information by incorporating advanced vision encoders and scaling up visual models.<n>In this work, we propose a novel visual framework, MoCHA, to address these issues.<n>Our framework integrates four vision backbones (i.e., CLIP, SigLIP, DINOv2 and ConvNeXt) to extract complementary visual features and is equipped with a sparse Mixture of Experts Connectors (MoECs) module.<n>To mitigate redundant or insufficient use of the visual information encoded by the MoECs module, we
arXiv Detail & Related papers (2025-07-30T16:15:22Z)
CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning [5.161314094237747]
We propose Contrastive Representation for MoE (CoMoE) to promote modularization and specialization in MoE.<n>Experiments on several benchmarks and in multi-task settings demonstrate that CoMoE can consistently enhance MoE's capacity and promote modularization among the experts.
arXiv Detail & Related papers (2025-05-23T06:58:44Z)
OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning [3.8813502422318127]
Building mixture-of-experts (MoE) architecture for Low-rank adaptation (LoRA) is emerging as a potential direction in parameter-efficient fine-tuning (PEFT) We first conduct qualitative analysis to indicate that experts collapse to similar representations in vanilla MoE, limiting the capacity of modular design and computational efficiency. Motivated by these findings, we propose Orthogonal Mixture-of-Experts (OMoE) Our method is simple and alleviates memory bottlenecks, as it incurs minimal experts compared to vanilla MoE models.
arXiv Detail & Related papers (2025-01-17T09:27:08Z)
A Survey on Mixture of Experts [11.801185267119298]
The mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal overhead. MoE has emerged as an effective method for substantially scaling up model capacity with minimal overhead. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE.
arXiv Detail & Related papers (2024-06-26T16:34:33Z)
A Closer Look into Mixture-of-Experts in Large Language Models [26.503570706063634]
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance. MoE architecture could increase the model size without sacrificing computational efficiency. We make an initial attempt to understand the inner workings of MoE-based large language models.
arXiv Detail & Related papers (2024-06-26T10:07:57Z)
Theory on Mixture-of-Experts in Continual Learning [72.42497633220547]
Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks. MoE model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network.
arXiv Detail & Related papers (2024-06-24T08:29:58Z)
MoVA: Adapting Mixture of Vision Experts to Multimodal Context [38.8308841469793]
We propose the MoVA, a powerful and novel MLLM, adaptively routing and fusing task-specific vision experts with a coarse-to-fine mechanism. In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts. In the fine-grained stage, we elaborately conduct the mixture-of-vision-expert adapter (MoV-Adapter) to extract and fuse task-specific knowledge.
arXiv Detail & Related papers (2024-04-19T17:59:48Z)
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts [15.535613294871487]
We propose a method called Mixture-of-Distilled-Expert (MoDE) MoDE applies moderate mutual distillation among experts to enable each expert to pick up more features learned by other experts.
arXiv Detail & Related papers (2024-01-31T03:52:32Z)
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models [49.32669226551026]
We propose a simple yet effective training strategy MoE-Tuning for LVLMs. MoE-LLaVA, a MoE-based sparse LVLM architecture, uniquely activates only the top-k experts through routers. Experiments show the significant performance of MoE-LLaVA in a variety of visual understanding and object hallucination benchmarks.
arXiv Detail & Related papers (2024-01-29T08:13:40Z)
Experts Weights Averaging: A New General Training Scheme for Vision Transformers [57.62386892571636]
We propose a training scheme for Vision Transformers (ViTs) that achieves performance improvement without increasing inference cost. During training, we replace some Feed-Forward Networks (FFNs) of the ViT with specially designed, more efficient MoEs. After training, we convert each MoE into an FFN by averaging the experts, transforming the model back into original ViT for inference.
arXiv Detail & Related papers (2023-08-11T12:05:12Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions. We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.