Related papers: LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading

LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading

URL: http://arxiv.org/abs/2501.09636v2
Date: Fri, 17 Jan 2025 11:44:53 GMT
Title: LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading
Authors: Kuan-Ming Liu, Ming-Chih Lo,
Abstract summary: We propose a novel framework that employs large language models (LLMs) as the router within the mixture-of-experts (MoE) architecture.<n>Specifically, we replace the conventional neural network-based router with LLMs, leveraging their extensive world knowledge and reasoning capabilities.<n>Our experiments on multimodal real-world stock datasets demonstrate that LLMoE outperforms state-of-the-art MoE models and other deep neural network approaches.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep learning and large language models (LLMs) have facilitated the deployment of the mixture-of-experts (MoE) mechanism in the stock investment domain. While these models have demonstrated promising trading performance, they are often unimodal, neglecting the wealth of information available in other modalities, such as textual data. Moreover, the traditional neural network-based router selection mechanism fails to consider contextual and real-world nuances, resulting in suboptimal expert selection. To address these limitations, we propose LLMoE, a novel framework that employs LLMs as the router within the MoE architecture. Specifically, we replace the conventional neural network-based router with LLMs, leveraging their extensive world knowledge and reasoning capabilities to select experts based on historical price data and stock news. This approach provides a more effective and interpretable selection mechanism. Our experiments on multimodal real-world stock datasets demonstrate that LLMoE outperforms state-of-the-art MoE models and other deep neural network approaches. Additionally, the flexible architecture of LLMoE allows for easy adaptation to various downstream tasks.

Related papers

EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models [25.12002287083368]
Multi-modal large language models (MLLMs) have increasingly adopted MoE techniques.<n>MoE experts are often by simply replicating the FFN parameters from LLMs.<n>Expert uniformity occurs because MoE experts are often by simply replicating the FFN parameters from LLMs.<n> router rigidity stems from the prevalent use of static linear routers for expert selection.
arXiv Detail & Related papers (2025-05-28T08:38:39Z)
INFERENCEDYNAMICS: Efficient Routing Across LLMs through Structured Capability and Knowledge Profiling [44.309917620936474]
InferenceDynamics is a flexible and scalable multi-dimensional routing framework by modeling the capability and knowledge of models.<n>We operate it on our comprehensive dataset RouteMix, and demonstrate its effectiveness and generalizability in group-level routing.
arXiv Detail & Related papers (2025-05-22T06:56:51Z)
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering [27.812611421754482]
We propose an MLLMs-based dual momentum Mixture-of-Experts (CL-MoE) framework for continual visual question answering (VQA) We integrate MLLMs with continual learning to utilize the rich commonsense knowledge in LLMs. Our method achieves state-of-the-art performance on 10 VQA tasks, proving the effectiveness of our approach.
arXiv Detail & Related papers (2025-03-01T09:25:23Z)
GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism [20.765816590224787]
GRAPHMOE is a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks.<n>We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets.<n>The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-01-14T06:59:51Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z)
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts [49.950419707905944]
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts. Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
arXiv Detail & Related papers (2024-06-17T19:06:54Z)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z)
Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless Systems [57.41621687431203]
Large language models (LLMs) and foundation models have been recently touted as a game-changer for 6G systems. This paper presents a comprehensive vision on how to design universal foundation models tailored towards the deployment of artificial intelligence (AI)-native networks.
arXiv Detail & Related papers (2024-01-30T00:21:41Z)
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models [44.848642930797155]
We release OpenMoE, a series of fully open-sourced and reproducible decoder-only Mixture-of-Experts (MoE) based large language models (LLMs) Our investigation confirms that MoE-based LLMs can offer a more favorable cost-effectiveness trade-off than dense LLMs. We find that routing decisions in MoE models are predominantly based on token IDs, with minimal context relevance.
arXiv Detail & Related papers (2024-01-29T12:05:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.