Flexibly Scaling Large Language Models Contexts Through Extensible
Tokenization
- URL: http://arxiv.org/abs/2401.07793v1
- Date: Mon, 15 Jan 2024 16:00:50 GMT
- Title: Flexibly Scaling Large Language Models Contexts Through Extensible
Tokenization
- Authors: Ninglu Shao and Shitao Xiao and Zheng Liu and Peitian Zhang
- Abstract summary: Large language models (LLMs) are in need of sufficient contexts to handle many critical applications.
Although the size of context window can be extended by fine-tuning, it will result in a substantial cost in both training and inference stage.
We present Extensible Tokenization as an alternative method which realizes the flexible scaling of LLMs' context.
- Score: 6.9004592877749005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are in need of sufficient contexts to handle
many critical applications, such as retrieval augmented generation and few-shot
learning. However, due to the constrained window size, the LLMs can only access
to the information within a limited context. Although the size of context
window can be extended by fine-tuning, it will result in a substantial cost in
both training and inference stage. In this paper, we present Extensible
Tokenization as an alternative method which realizes the flexible scaling of
LLMs' context. Extensible Tokenization stands as a midware in between of the
tokenized context and the LLM, which transforms the raw token embeddings into
the extensible embeddings. Such embeddings provide a more compact
representation for the long context, on top of which the LLM is able to
perceive more information with the same context window. Extensible Tokenization
is also featured by its flexibility: the scaling factor can be flexibly
determined within a feasible scope, leading to the extension of an arbitrary
context length at the inference time. Besides, Extensible Tokenization is
introduced as a drop-in component, which can be seamlessly plugged into not
only the LLM itself and but also its fine-tuned derivatives, bringing in the
extended contextual information while fully preserving the LLM's existing
capabilities. We perform comprehensive experiments on long-context language
modeling and understanding tasks, which verify Extensible Tokenization as an
effective, efficient, flexible, and compatible method to extend LLM's context.
Our model and source code will be made publicly available.
Related papers
- ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning [72.90823351726374]
We introduce the Unified framework for Large Language Model Embedding (ULLME), a flexible, plug-and-play implementation that enables bidirectional attention across various LLMs.
We also propose Generation-augmented Representation Learning (GRL), a novel fine-tuning method to boost LLMs for text embedding tasks.
To showcase our framework's flexibility and effectiveness, we release three pre-trained models from ULLME with different backbone architectures.
arXiv Detail & Related papers (2024-08-06T18:53:54Z) - Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective.
This paper presents a novel concept of context structurization.
Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z) - Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach.
Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens.
We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z) - Soft Prompting for Unlearning in Large Language Models [11.504012974208466]
This work focuses on investigating machine unlearning for Large Language Models motivated by data protection regulations.
We propose a framework textbfSoft textbfPrompting for textbfUntextbflearning (SPUL)
We conduct a rigorous evaluation of the proposed method and our results indicate that SPUL can significantly improve the trade-off between utility and forgetting.
arXiv Detail & Related papers (2024-06-17T19:11:40Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Extensible Embedding: A Flexible Multipler For LLM's Context Length [6.9004592877749005]
Large language models (LLMs) call for extension of context to handle many critical applications.
Existing approaches are prone to expensive costs and inferior quality of context extension.
We propose Extensible Embedding, which realizes high-quality extension of LLM's context with strong flexibility and cost-effectiveness.
arXiv Detail & Related papers (2024-02-18T12:50:19Z) - BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval
Augmented Long-Context Large Language Models [13.229325187638432]
Large language models (LLMs) call for extension of context to handle many critical applications.
Existing approaches are prone to expensive costs and inferior quality of context extension.
Extensible embedding stand as an enhancement of typical token embedding.
arXiv Detail & Related papers (2024-02-18T12:41:01Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Towards More Unified In-context Visual Understanding [74.55332581979292]
We present a new ICL framework for visual understanding with multi-modal output enabled.
First, we quantize and embed both text and visual prompt into a unified representational space.
Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them.
arXiv Detail & Related papers (2023-12-05T06:02:21Z) - Let Models Speak Ciphers: Multiagent Debate through Embeddings [84.20336971784495]
We introduce CIPHER (Communicative Inter-Model Protocol Through Embedding Representation) to address this issue.
By deviating from natural language, CIPHER offers an advantage of encoding a broader spectrum of information without any modification to the model weights.
This showcases the superiority and robustness of embeddings as an alternative "language" for communication among LLMs.
arXiv Detail & Related papers (2023-10-10T03:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.