Naive Bayes-based Context Extension for Large Language Models
- URL: http://arxiv.org/abs/2403.17552v1
- Date: Tue, 26 Mar 2024 09:59:45 GMT
- Title: Naive Bayes-based Context Extension for Large Language Models
- Authors: Jianlin Su, Murtadha Ahmed, Wenbo, Luo Ao, Mingren Zhu, Yunfeng Liu,
- Abstract summary: We introduce a novel framework, called Naive Bayes-based Context Extension (NBCE)
NBCE enables existing Large Language Models (LLMs) to perform In-Context Learning (ICL) with an increased number of demonstrations.
NBCE substantially enhances performance, particularly as the number of demonstration examples increases.
- Score: 2.743675474582704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown promising in-context learning abilities. However, conventional In-Context Learning (ICL) approaches are often impeded by length limitations of transformer architecture, which pose challenges when attempting to effectively integrate supervision from a substantial number of demonstration examples. In this paper, we introduce a novel framework, called Naive Bayes-based Context Extension (NBCE), to enable existing LLMs to perform ICL with an increased number of demonstrations by significantly expanding their context size. Importantly, this expansion does not require fine-tuning or dependence on particular model architectures, all the while preserving linear efficiency. NBCE initially splits the context into equal-sized windows fitting the target LLM's maximum length. Then, it introduces a voting mechanism to select the most relevant window, regarded as the posterior context. Finally, it employs Bayes' theorem to generate the test task. Our experimental results demonstrate that NBCE substantially enhances performance, particularly as the number of demonstration examples increases, consistently outperforming alternative methods. The NBCE code will be made publicly accessible. The code NBCE is available at: https://github.com/amurtadha/NBCE-master
Related papers
- Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines [71.14354526117958]
In-context learning (ICL) is an important yet not fully understood ability of pre-trained large language models (LLMs)<n>We present LongGuide, which efficiently generates two parallel streams of guidelines capturing task language and format properties.<n>LongGuide automatically selects the best combination of guidelines, improving both strong open- and closed-source LLMs by over 5% in both zero- and few-shot settings.
arXiv Detail & Related papers (2025-06-02T02:35:24Z) - Efficient Multi-modal Long Context Learning for Training-free Adaptation [96.21248144937627]
This paper introduces Efficient Multi-Modal Long Context Learning (EMLoC)<n>It embeds demonstration examples directly into the model input.<n>It condenses long-context multimodal inputs into compact, task-specific memory representations.
arXiv Detail & Related papers (2025-05-26T10:49:44Z) - MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning [0.0]
We introduce Mitigating Attention Dispersion in large-scale ICL (MateICL)<n>We show that MateICL can effectively leverage larger contexts to improve ICL performance.<n>Despite advances in inference strategies, our results demonstrate that MateICL remains beneficial in computationally resource-constrained settings.
arXiv Detail & Related papers (2025-05-02T08:45:45Z) - VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding [48.26536049440913]
Video large multimodal models (LMMs) have significantly improved their video understanding and reasoning capabilities.
Their performance drops on out-of-distribution (OOD) tasks that are underrepresented in training data.
Traditional methods like fine-tuning on OOD datasets are impractical due to high computational costs.
We propose VideoICL, a novel video in-context learning framework for OOD tasks.
arXiv Detail & Related papers (2024-12-03T05:54:43Z) - Why Does the Effective Context Length of LLMs Fall Short? [68.34573617977013]
In this work, we introduce ShifTed Rotray position embeddING (STRING)
STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths.
Experimental results show that STRING dramatically improves the performance of the latest large-scale models.
arXiv Detail & Related papers (2024-10-24T13:51:50Z) - In-Context Learning with Long-Context Models: An In-Depth Exploration [92.16922648612807]
We show that, for many datasets with large label spaces, performance continues to increase with thousands of demonstrations.
We show that long-context ICL can be an effective tool, and may not require long-context for encoding the demonstration set at all.
arXiv Detail & Related papers (2024-04-30T21:06:52Z) - ParaICL: Towards Robust Parallel In-Context Learning [74.38022919598443]
Large language models (LLMs) have become the norm in natural language processing.
Few-shot in-context learning (ICL) relies on the choice of few-shot demonstration examples.
We propose a novel method named parallel in-context learning (ParaICL)
arXiv Detail & Related papers (2024-03-31T05:56:15Z) - Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge [47.750073410717604]
We introduce Temporal Grounding Bridge (TGB), a novel framework that bootstraps MLLMs with advanced temporal grounding capabilities.
We validate TGB across seven video benchmarks and demonstrate substantial performance improvements compared with prior MLLMs.
Our model, initially trained on sequences of four frames, effectively handles sequences up to 16 longer without sacrificing performance.
arXiv Detail & Related papers (2024-02-25T10:27:46Z) - EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism [70.07661254213181]
We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs)
Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting.
Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead.
arXiv Detail & Related papers (2023-12-08T09:31:50Z) - DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase [37.68804898063595]
In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks.
We propose textbfData textbfAugmentation for textbfIn-Context textbfLearning (textbfDAIL)
arXiv Detail & Related papers (2023-11-06T18:12:55Z) - CLEX: Continuous Length Extrapolation for Large Language Models [68.43814043853347]
We propose Continuous Length EXtrapolation (CLEX) for Large Language Models (LLMs)
CLEX extends the context window to over 4x or almost 8x training length, with no deterioration in performance.
Our model trained on a 4k length exhibits competitive performance against state-of-the-art open-source models trained on context lengths up to 32k.
arXiv Detail & Related papers (2023-10-25T08:13:02Z) - CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without
Full Large Language Model [22.870512676002463]
This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators.
Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs.
Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT.
arXiv Detail & Related papers (2023-10-24T03:08:58Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.