Related papers: Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

URL: http://arxiv.org/abs/2406.19598v2
Date: Thu, 17 Oct 2024 03:53:50 GMT
Title: Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan,
Abstract summary: Large language models (LLMs) exhibit uneven awareness of different contextual positions. We introduce a novel method called "Mixture of In-Context Experts" (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy.
Score: 51.65245442281049
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called "Mixture of In-Context Experts" (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an `in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions. This approach mitigates the risk of overlooking essential contextual information. (2) The router-only training strategy entails freezing LLM parameters and exclusively updating routers for only a few steps. When applied to open-source LLMs including Llama and Mistral, MoICE surpasses prior methods across multiple tasks on long context understanding and generation, all while maintaining commendable inference efficiency.

Related papers

Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management [15.059686456324853]
Large Language Models (LLMs) suffer from significant performance degradation when processing long contexts due to proactive interference.<n>We introduce Sculptor, a framework that equips LLMs with three categories of tools: context fragmentation, summary, hide, and restore, and precise search.<n> Experimental evaluation on diverse long-context benchmarks demonstrates that Sculptor significantly improves performance even without specific training.
arXiv Detail & Related papers (2025-08-06T17:32:58Z)
Advancing Compositional LLM Reasoning with Structured Task Relations in Interactive Multimodal Communications [42.945657927971]
This paper presents a novel paradigm that accomplishes various IMAs using a single compositional LLM over wireless networks.<n>To tackle the first challenge, we propose ContextLoRA, a novel method that guides an LLM to learn the rich structured context among IMAs.<n>Experiments on three benchmarks show the superiority of the proposed ContextLoRA and ContextGear.
arXiv Detail & Related papers (2025-07-28T09:33:12Z)
Universal Model Routing for Efficient LLM Inference [72.65083061619752]
We consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. We prove that these strategies are estimates of a theoretically optimal routing rule, and provide an excess risk bound to quantify their errors.
arXiv Detail & Related papers (2025-02-12T20:30:28Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
Perspective Transition of Large Language Models for Solving Subjective Tasks [18.322631948136973]
Reasoning through Perspective Transition (RPT) is a method based on in-context learning that enables LLMs to dynamically select among direct, role, and third-person perspectives. Our method outperforms widely used single fixed perspective based methods such as chain-of-thought prompting and expert prompting.
arXiv Detail & Related papers (2025-01-16T03:30:47Z)
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search [37.16633337724158]
DOTS is an approach enabling LLMs to reason dynamically via optimal reasoning trajectory search. Our method consistently outperforms static reasoning techniques and the vanilla instruction tuning approach.
arXiv Detail & Related papers (2024-10-04T18:58:09Z)
Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach. Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens. We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
NoteLLM-2: Multimodal Large Representation Models for Recommendation [60.17448025069594]
We investigate the potential of Large Language Models to enhance multimodal representation in multimodal item-to-item recommendations. One feasible method is the transfer of Multimodal Large Language Models (MLLMs) for representation tasks. We propose a novel training framework, NoteLLM-2, specifically designed for multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Stance Detection with Collaborative Role-Infused LLM-Based Agents [39.75103353173015]
Stance detection is vital for content analysis in web and social media research. However, stance detection requires advanced reasoning to infer authors' implicit viewpoints. We design a three-stage framework in which LLMs are designated distinct roles. We achieve state-of-the-art performance across multiple datasets.
arXiv Detail & Related papers (2023-10-16T14:46:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.