Related papers: E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

URL: http://arxiv.org/abs/2409.06679v1
Date: Tue, 10 Sep 2024 17:44:35 GMT
Title: E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
Authors: Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, Wei Zhang,
Abstract summary: We introduce E2LLM (Encodergated Large Language Models), a novel approach that effectively navigates the "impossible triangle" The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models.
Score: 20.660297311025417
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the realm of Large Language Models (LLMs), the ability to process long contexts is increasingly crucial for tasks such as multi-round dialogues, code generation, and document summarization. This paper addresses the challenges of enhancing the long-context performance, reducing computational complexity, and leveraging pretrained models collectively termed the "impossible triangle." We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. The method involves splitting long contexts into chunks, compressing each into embedding vectors via a pretrained text encoder, and utilizing an adapter to align these representations with a decoder-only LLM. Two training objectives, focusing on reconstruction of the encoder output and long-context instruction fine-tuning, are employed to facilitate the understanding of soft prompts by the LLM. Experimental results demonstrate that E2LLM achieves superior performance in long-context scenarios while balancing efficiency, performance, and compatibility with pretrained models. Our framework thus represents a significant advancement in the field, contributing to effective long-text modeling.

Related papers

Boosting CTC-Based ASR Using LLM-Based Intermediate Loss Regularization [5.439020425819001]
We propose a novel auxiliary loss framework called Language-Aware Intermediate Loss (LAIL) to enhance CTC-based automatic speech recognition systems.<n>LAIL maps outputs to the embedding space of large language models (LLMs) and computes a causal language modeling loss during training.<n>We demonstrate significant improvements in Word Error Rate (WER) on the LibriSpeech, TEDLIUM2, and WSJ corpora, achieving state-of-the-art performance for CTC-based ASR with minimal computational overhead.
arXiv Detail & Related papers (2025-06-28T10:59:42Z)
Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines [71.14354526117958]
In-context learning (ICL) is an important yet not fully understood ability of pre-trained large language models (LLMs)<n>We present LongGuide, which efficiently generates two parallel streams of guidelines capturing task language and format properties.<n>LongGuide automatically selects the best combination of guidelines, improving both strong open- and closed-source LLMs by over 5% in both zero- and few-shot settings.
arXiv Detail & Related papers (2025-06-02T02:35:24Z)
MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities [37.173351698875685]
We propose MAGNET, a method for adapting decoder-only large language models (LLMs) to generate robust representations and infill missing text spans. Our results demonstrate that LLMs adapted with MAGNET surpass strong text encoders on token-level and sentence-level representation learning tasks.
arXiv Detail & Related papers (2025-01-15T08:24:03Z)
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation. We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding. Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z)
EMMA: Efficient Visual Alignment in Multi-Modal LLMs [56.03417732498859]
EMMA is a lightweight cross-modality module designed to efficiently fuse visual and textual encodings. EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations.
arXiv Detail & Related papers (2024-10-02T23:00:31Z)
Investigating Decoder-only Large Language Models for Speech-to-text Translation [39.17113782374464]
Large language models (LLMs) are known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains. We propose a decoder-only architecture that enables the LLM to directly consume the encoded speech representation and generate the text translation. Our model achieves state-of-the-art performance on CoVoST 2 and FLEURS among models trained without proprietary data.
arXiv Detail & Related papers (2024-07-03T14:42:49Z)
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches [52.02764371205856]
Long context capability is a crucial competency for large language models (LLMs) This work provides a taxonomy of current methods and evaluating 10+ state-of-the-art approaches across seven categories of long context tasks.
arXiv Detail & Related papers (2024-07-01T17:59:47Z)
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment [20.868216061750402]
We introduce an Efficient Large Language Model Adapter, termed ELLA, which equips text-to-image diffusion models with powerful Large Language Models (LLM) Our approach adapts semantic features at different stages of the denoising process, assisting diffusion models in interpreting lengthy and intricate prompts over sampling timesteps. To assess text-to-image models in dense prompt following, we introduce a challenging benchmark consisting of 1K dense prompts.
arXiv Detail & Related papers (2024-03-08T08:08:10Z)
Long-Context Language Modeling with Parallel Context Encoding [37.64884969997378]
We introduce a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention. CEPE yields strong performance on language modeling and in-context learning.
arXiv Detail & Related papers (2024-02-26T14:47:35Z)
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens [15.566726645722657]
We propose a novel framework specifically designed for speculative sampling. Within this framework, we introduce a lightweight draft model that effectively utilizes previously generated tokens to predict subsequent words. We demonstrate impressive results, achieving an average latency speedup ratio of 2.7x compared to the vanilla auto-regressive decoding approach.
arXiv Detail & Related papers (2024-02-24T08:10:39Z)
Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition [51.66383337087724]
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. We propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure.
arXiv Detail & Related papers (2023-12-31T09:24:21Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
In-context Autoencoder for Context Compression in a Large Language Model [70.7621953091318]
We propose the In-context Autoencoder (ICAE) to compress a long context into short compact memory slots. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data.
arXiv Detail & Related papers (2023-07-13T17:59:21Z)
Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder [64.55176104620848]
We propose a novel NAR E2E-ST framework, Orthros, in which both NAR and autoregressive (AR) decoders are jointly trained on the shared speech encoder. The latter is used for selecting better translation among various length candidates generated from the former, which dramatically improves the effectiveness of a large length beam with negligible overhead. Experiments on four benchmarks show the effectiveness of the proposed method in improving inference speed while maintaining competitive translation quality.
arXiv Detail & Related papers (2020-10-25T06:35:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.