Related papers: GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration

GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration

URL: http://arxiv.org/abs/2412.16216v3
Date: Tue, 27 May 2025 02:34:28 GMT
Title: GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration
Authors: Ting Bai, Yue Yu, Le Huang, Zenan Xu, Zhe Zhao, Chuan Shi,
Abstract summary: We introduce a novel MoE graph-based framework $textbfGMoE$, aimed at enhancing the collaboration among multiple experts.<n>In GMoE, a graph router function is designed to capture the collaboration signals among experts.<n>We put forward two coordination strategies in GMoE: the $textitPoisson distribution-based distinction strategy and the $textitNormal distribution-based balance strategy.
Score: 39.302800055216764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The sparse Mixture-of-Experts (MoE) architecture of large language models (LLMs) confronts an inherent issue of load imbalance arising from the simplistic linear router strategy, which ultimately causes the instability and inefficient learning of LLMs. To address this challenge, we introduce a novel MoE graph-based framework $\textbf{GMoE}$, aimed at enhancing the collaboration among multiple experts. In GMoE, a graph router function is designed to capture the collaboration signals among experts. This enables all experts to dynamically allocate information derived from input data by sharing information with their neighboring experts. Moreover, we put forward two coordination strategies in GMoE: the $\textit{Poisson distribution-based distinction strategy}$ and the $\textit{Normal distribution-based balance strategy}$, to further release the capacity of each expert and increase the model stability in the fine-tuning of LLMs. Specifically, we leverage a parameter-efficient fine-tuning technique, i.e., Low-Rank Adaptation (LoRA), to implement the graph MoE architecture. Extensive experiments on four real-world benchmark datasets demonstrate the effectiveness of GMoE, showing the benefits of facilitating collaborations of multiple experts in LLM fine-tuning. The code of experimental implementation is available at https://github.com/BAI-LAB/GMoE

Related papers

A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis [43.746749403268275]
Large Language Models (LLMs) suffer from high computational costs, environmental inefficiency, and potential biases inherited from monolithic architectures.<n>We propose a collaborative framework, GRA, that aggregates specialized roles across small LLMs to generate high-quality, diverse, and reliable data.<n>Our results challenge the necessity of monolithic large models for high-quality data synthesis, advocating instead for strategic coordination of smaller agents.
arXiv Detail & Related papers (2025-04-11T06:13:43Z)
Data-centric Federated Graph Learning with Large Language Models [34.224475952206404]
In federated graph learning (FGL), a complete graph is divided into multiple subgraphs stored in each client due to privacy concerns.<n>A pain point of FGL is the heterogeneity problem, where nodes or structures present non-IID properties among clients.<n>We propose a general framework that innovatively decomposes the task of large language models for FGL into two sub-tasks theoretically.
arXiv Detail & Related papers (2025-03-25T08:43:08Z)
A Hierarchical Language Model For Interpretable Graph Reasoning [47.460255447561906]
We introduce Hierarchical Language Model for Graph (HLM-G), which employs a two-block architecture to capture node-centric local information and interaction-centric global structure. The proposed scheme allows LLMs to address various graph queries with high efficacy, efficiency, and robustness, while reducing computational costs on large-scale graph tasks. Comprehensive evaluations across diverse graph reasoning and real-world tasks of node, link, and graph-levels highlight the superiority of our method.
arXiv Detail & Related papers (2024-10-29T00:28:02Z)
Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation [9.844598565914055]
Large Language Models (LLMs) demonstrate strong reasoning abilities but face limitations such as hallucinations and outdated knowledge. We introduce SubgraphRAG, extending the Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) framework that retrieves subgraphs. Our approach innovatively integrates a lightweight multilayer perceptron with a parallel triple-scoring mechanism for efficient and flexible subgraph retrieval.
arXiv Detail & Related papers (2024-10-28T04:39:32Z)
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free [21.59456761618456]
Large language models (LLMs) excel on generation tasks, their decoder-only architecture often limits their potential as embedding models if no further representation finetuning is applied. Our study shows that the expert routers in MoE LLMs can serve as an off-the-shelf embedding model with promising performance on a diverse class of embedding-focused tasks.
arXiv Detail & Related papers (2024-10-14T17:59:44Z)
Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents [27.4884498301785]
We introduce GraphAgent-Reasoner, a fine-tuning-free framework for explicit and precise graph reasoning. Inspired by distributed graph computation theory, our framework decomposes graph problems into smaller, node-centric tasks that are distributed among multiple agents. Our framework demonstrates the capability to handle real-world graph reasoning applications such as webpage importance analysis.
arXiv Detail & Related papers (2024-10-07T15:34:14Z)
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension [53.6373473053431]
This work introduces a benchmark to assess large language models' capabilities in graph pattern tasks. We have developed a benchmark that evaluates whether LLMs can understand graph patterns based on either terminological or topological descriptions. Our benchmark encompasses both synthetic and real datasets, and a variety of models, with a total of 11 tasks and 7 models.
arXiv Detail & Related papers (2024-10-04T04:48:33Z)
How to Make LLMs Strong Node Classifiers? [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, such as Graph Neural Networks (GNNs) and Graph Transformers (GTs)<n>We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art (SOTA) GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
Layerwise Recurrent Router for Mixture-of-Experts [42.36093735411238]
Mixture-of-Experts (MoE) architecture stands out for its ability to scale model size without significantly increasing training costs.<n>Current MoE models often display parameter inefficiency.<n>We introduce the Layerwise Recurrent Router for Mixture-of-Experts (RMoE)
arXiv Detail & Related papers (2024-08-13T10:25:13Z)
Dual-Channel Latent Factor Analysis Enhanced Graph Contrastive Learning for Recommendation [2.9449497738046078]
Graph Neural Networks (GNNs) are powerful learning methods for recommender systems. Recently, the integration of contrastive learning with GNNs has demonstrated remarkable performance in recommender systems. This study proposes a latent factor analysis (LFA) enhanced GCL approach, named LFA-GCL.
arXiv Detail & Related papers (2024-08-09T03:24:48Z)
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives. Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance. In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z)
All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks [51.19110891434727]
Large Language Models (LLMs) with pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data. E-LLaGNN is a framework with an on-demand LLM service that enriches message passing procedure of graph learning by enhancing a limited fraction of nodes from the graph.
arXiv Detail & Related papers (2024-07-20T22:09:42Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z)
KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph [134.8631016845467]
We propose an autonomous LLM-based agent framework, called KG-Agent. In KG-Agent, we integrate the LLM, multifunctional toolbox, KG-based executor, and knowledge memory. To guarantee the effectiveness, we leverage program language to formulate the multi-hop reasoning process over the KG.
arXiv Detail & Related papers (2024-02-17T02:07:49Z)
Can we Soft Prompt LLMs for Graph Learning Tasks? [22.286189757942054]
GraphPrompter is a framework designed to align graph information with Large Language Models (LLMs) via soft prompts. The framework unveils the substantial capabilities of LLMs as predictors in graph-related tasks.
arXiv Detail & Related papers (2024-02-15T23:09:42Z)
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z)
Integrating Graphs with Large Language Models: Methods and Prospects [68.37584693537555]
Large language models (LLMs) have emerged as frontrunners, showcasing unparalleled prowess in diverse applications. Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest. This paper bifurcates such integrations into two predominant categories.
arXiv Detail & Related papers (2023-10-09T07:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.