Related papers: DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning

DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning

URL: http://arxiv.org/abs/2403.04233v2
Date: Sun, 16 Jun 2024 06:44:50 GMT
Title: DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning
Authors: Xingwei Qu, Yiming Liang, Yucheng Wang, Tianyu Zheng, Tommy Yue, Lei Ma, Stephen W. Huang, Jiajun Zhang, Yinan Shi, Chenghua Lin, Jie Fu, Ge Zhang,
Abstract summary: It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities. We introduce DEEP-ICL, a novel task Definition Enriched ExPert Ensembling methodology for ICL. We argue that improvement from ICL does not directly rely on model size, but essentially stems from understanding task definitions and task-guided learning.
Score: 37.22553531518853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities, enabling remarkable performance improvements by leveraging task-specific demonstrations. Challenging this hypothesis, we introduce DEEP-ICL, a novel task Definition Enriched ExPert Ensembling methodology for ICL. DEEP-ICL explicitly extracts task definitions from given demonstrations and generates responses through learning task-specific examples. We argue that improvement from ICL does not directly rely on model size, but essentially stems from understanding task definitions and task-guided learning. Inspired by this, DEEP-ICL combines two 3B models with distinct roles (one for concluding task definitions and the other for learning task demonstrations) and achieves comparable performance to LLaMA2-13B. Furthermore, our framework outperforms conventional ICL by overcoming pretraining sequence length limitations, by supporting unlimited demonstrations. We contend that DEEP-ICL presents a novel alternative for achieving efficient few-shot learning, extending beyond the conventional ICL.

Related papers

Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models [11.913271486031201]
We develop a Context-aware instructional task assistant with multi-modal large language models (InsTALL) InsTALL responds in real-time to user queries related to the task at hand. We show InsTALL achieves state-of-the-art performance across proposed sub-tasks considered for multimodal activity understanding.
arXiv Detail & Related papers (2025-01-21T15:55:06Z)
Multimodal Contrastive In-Context Learning [0.9120312014267044]
This paper introduces a novel multimodal contrastive in-context learning framework to enhance our understanding of gradient-free in-context learning (ICL) in Large Language Models (LLMs) First, we present a contrastive learning-based interpretation of ICL in real-world settings, marking the distance of the key-value representation as the differentiator in ICL. Second, we develop an analytical framework to address biases in multimodal input formatting for real-world datasets. Third, we propose an on-the-fly approach for ICL that demonstrates effectiveness in detecting hateful memes.
arXiv Detail & Related papers (2024-08-23T10:10:01Z)
Large Language Models Know What Makes Exemplary Contexts [42.90814615222177]
In-context learning (ICL) has proven to be a significant capability with the advancement of Large Language models (LLMs) This paper presents a unified framework for LLMs that allows them to self-select influential in-context examples to compose their contexts.
arXiv Detail & Related papers (2024-08-14T12:32:41Z)
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model [11.885204227946549]
We propose a comprehensive model designed to represent various tasks using a unified representation. Our model exhibits strong capabilities in comprehending the implicit intent of user instructions. Our approach exhibits exceptional scalability and generality.
arXiv Detail & Related papers (2024-08-05T14:27:39Z)
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models [47.162575147632396]
Transferable Visual Prompting (TVP) is a simple and effective approach to generate visual prompts that can transfer to different models and improve their performance on downstream tasks after trained on only one model. We introduce two strategies to address the issue of cross-model feature corruption of existing visual prompting methods and enhance the transferability of the learned prompts.
arXiv Detail & Related papers (2024-04-17T09:39:07Z)
Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z)
ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing. Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples. We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z)
Towards Unified Task Embeddings Across Multiple Models: Bridging the Gap for Prompt-Based Large Language Models and Beyond [16.913115978881866]
We propose a framework for unified task embeddings (FUTE), task embeddings from various models, including smaller language models and Large Language Models with varied prompts, within a single vector space. Such uniformity enables comparison and analysis of similarities amongst different models, broadening the scope and utility of existing task embedding methods in multi-model scenarios.
arXiv Detail & Related papers (2024-02-22T13:13:31Z)
Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning [9.660673938961416]
Demonstration ordering is an important strategy for in-context learning (ICL) We propose a simple but effective demonstration ordering method for ICL, named the few-shot In-Context Curriculum Learning (ICCL)
arXiv Detail & Related papers (2024-02-16T14:55:33Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks [54.71034943526973]
In-context learning (ICL) has become the default method for using large language models (LLMs) We find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications. We identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability.
arXiv Detail & Related papers (2023-11-15T14:26:30Z)
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion [48.69347134411864]
Embodied MultiModal Agent (EMMA) is a unified encoder-decoder model that reasons over images and trajectories. By unifying all tasks as text generation, EMMA learns a language of actions which facilitates transfer across tasks.
arXiv Detail & Related papers (2023-11-07T15:27:52Z)
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. How to select new tasks to improve the performance and generalizability of IT models remains an open question. We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z)
Scaling In-Context Demonstrations with Structured Attention [75.41845145597875]
We propose a better architectural design for in-context learning. Structured Attention for In-Context Learning replaces the full-attention by a structured attention mechanism. We show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up.
arXiv Detail & Related papers (2023-07-05T23:26:01Z)
Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning [74.70157466822612]
We systematically study the role of task definitions in instruction learning. We find that model performance drops substantially when removing contents describing the task output. We propose two strategies to help models better leverage task instructions.
arXiv Detail & Related papers (2023-06-01T21:11:24Z)
Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs) Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z)
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning [24.395288160951118]
Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations. We characterize two ways through which ICL leverages demonstrations. We show that models can achieve non-trivial performance with only TR, and TR does not further improve with larger models or more demonstrations.
arXiv Detail & Related papers (2023-05-16T18:05:19Z)
CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions. We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD. Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.