Related papers: LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

URL: http://arxiv.org/abs/2411.00918v1
Date: Fri, 01 Nov 2024 14:04:36 GMT
Title: LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models
Authors: Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham,
Abstract summary: emphLibMoE is a comprehensive framework to streamline the research, training, and evaluation of MoE algorithms. LibMoE brings MoE in large language models (LLMs) more accessible to a wide range of researchers by standardizing the training and evaluation pipelines.
Score: 7.164238322896674
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Mixture of Experts (MoEs) plays an important role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large scale MoE algorithms remain in-accessible to many researchers. This work develops \emph{LibMoE}, a comprehensive and modular framework to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training; (iii) comprehensive evaluation, LibMoE brings MoE in LLMs more accessible to a wide range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms over three different LLMs and 11 datasets under the zero-shot setting. The results show that despite the unique characteristics, all MoE algorithms perform roughly similar when averaged across a wide range of tasks. With the modular design and extensive evaluation, we believe LibMoE will be invaluable for researchers to make meaningful progress towards the next generation of MoE and LLMs. Project page: \url{https://fsoft-aic.github.io/fsoft-LibMoE.github.io}.

Related papers

Teamwork makes the dream work: LLMs-Based Agents for GitHub README.MD Summarization [7.330697128881243]
We propose Metagente as a novel approach to amplify the synergy of various Large Language Models (LLMs) Metagente is a Multi-Agent framework based on a series of LLMs to self-optimize the system through evaluation, feedback, and cooperation among specialized agents. The performance gain compared to GitSum, the most relevant benchmark, ranges from 27.63% to 60.43%.
arXiv Detail & Related papers (2025-03-13T20:42:39Z)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks. SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues. We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z)
LLMBox: A Comprehensive Library for Large Language Models [109.15654830320553]
This paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of large language models (LLMs) This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets, and models, and (3) more practical consideration, especially on user-friendliness and efficiency.
arXiv Detail & Related papers (2024-07-08T02:39:33Z)
A Survey on Mixture of Experts [11.801185267119298]
The mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal overhead. MoE has emerged as an effective method for substantially scaling up model capacity with minimal overhead. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE.
arXiv Detail & Related papers (2024-06-26T16:34:33Z)
A Closer Look into Mixture-of-Experts in Large Language Models [26.503570706063634]
Mixture-of-experts (MoE) is gaining increasing attention due to its unique properties and remarkable performance. MoE architecture could increase the model size without sacrificing computational efficiency. We make an initial attempt to understand the inner workings of MoE-based large language models.
arXiv Detail & Related papers (2024-06-26T10:07:57Z)
UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs [74.1976921342982]
This paper introduces UltraEval, a user-friendly evaluation framework characterized by its lightweight nature, comprehensiveness, modularity, and efficiency. The resulting composability allows for the free combination of different models, tasks, prompts, benchmarks, and metrics within a unified evaluation workflow.
arXiv Detail & Related papers (2024-04-11T09:17:12Z)
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models [44.848642930797155]
We release OpenMoE, a series of fully open-sourced and reproducible decoder-only Mixture-of-Experts (MoE) based large language models (LLMs) Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs. We find that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance.
arXiv Detail & Related papers (2024-01-29T12:05:02Z)
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models [49.32669226551026]
We propose a simple yet effective training strategy MoE-Tuning for LVLMs. MoE-LLaVA, a MoE-based sparse LVLM architecture, uniquely activates only the top-k experts through routers. Experiments show the significant performance of MoE-LLaVA in a variety of visual understanding and object hallucination benchmarks.
arXiv Detail & Related papers (2024-01-29T08:13:40Z)
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way [59.09824823710863]
CoLLiE is an efficient library that facilitates collaborative training of large language models. With its modular design and comprehensive functionality, CoLLiE offers a balanced blend of efficiency, ease of use, and customization.
arXiv Detail & Related papers (2023-12-01T08:02:16Z)
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE [83.00018517368973]
Large Language Models (LLMs) can extend their zero-shot capabilities to multimodal learning through instruction tuning. negative conflicts and interference may have a worse impact on performance. We combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning.
arXiv Detail & Related papers (2023-11-05T15:48:29Z)
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks. This paper presents the first comprehensive MLLM Evaluation benchmark MME. It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z)
FedML: A Research Library and Benchmark for Federated Machine Learning [55.09054608875831]
Federated learning (FL) is a rapidly growing research field in machine learning. Existing FL libraries cannot adequately support diverse algorithmic development. We introduce FedML, an open research library and benchmark to facilitate FL algorithm development and fair performance comparison.
arXiv Detail & Related papers (2020-07-27T13:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.