Related papers: OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

URL: http://arxiv.org/abs/2409.05152v2
Date: Wed, 2 Oct 2024 05:02:02 GMT
Title: OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs
Authors: Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang,
Abstract summary: One-pass Generation and retrieval framework (OneGen) OneGen bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. Results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance.
Score: 44.054569398300266
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face limitations in directly handling retrieval tasks. However, many practical applications demand the seamless integration of both retrieval and generation. This paper introduces a novel and efficient One-pass Generation and retrieval framework (OneGen), designed to improve LLMs' performance on tasks that require both generation and retrieval. The proposed framework bridges the traditionally separate training approaches for generation and retrieval by incorporating retrieval tokens generated autoregressively. This enables a single LLM to handle both tasks simultaneously in a unified forward pass. We conduct experiments on two distinct types of composite tasks, RAG and Entity Linking, to validate the pluggability, effectiveness, and efficiency of OneGen in training and inference. Furthermore, our results show that integrating generation and retrieval within the same context preserves the generative capabilities of LLMs while improving retrieval performance. To the best of our knowledge, OneGen is the first to enable LLMs to conduct vector retrieval during the generation.

Related papers

Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards [50.21528417884747]
We introduce Omni-Thinker, a unified reinforcement learning framework that enhances large language models (LLMs) performance across diverse tasks.<n>Our approach enables consistent optimization across task types and scales RL-based training to subjective domains.<n> Experimental results across four domains reveal that curriculum learning improves performance by 5.2% over joint training and 9.1% over model merging.
arXiv Detail & Related papers (2025-07-20T01:50:16Z)
LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation [20.899800063233]
We propose a teaming framework combining LLMs' symbolic generation with ML's gradient-steered search.<n>Experiments show that the teaming policy can achieve 5% improvement in downstream performance while reducing nearly half of the error cases.
arXiv Detail & Related papers (2025-06-10T08:10:16Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
How Effective are Generative Large Language Models in Performing Requirements Classification? [4.429729688079712]
This study explores the effectiveness of three generative large language models (LLMs) performing both binary and multi-class requirements classification. Our study concludes that while factors like prompt design and LLM architecture are universally important, others-such as dataset variations-have a more situational impact, depending on the complexity of the classification task.
arXiv Detail & Related papers (2025-04-23T14:41:11Z)
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection [72.92366526004464]
Retrieval-Augmented Generation (RAG) has proven effective in enabling Large Language Models (LLMs) to produce more accurate and reliable responses. We propose a novel Self-Selection RAG framework, where the LLM is made to select from pairwise responses generated with internal parametric knowledge solely.
arXiv Detail & Related papers (2025-02-10T04:29:36Z)
Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG. In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning. In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z)
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings [7.957874169275548]
Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text. We propose a novel method, which uses LLMs to generate diverse transformations of a sentence that preserve its meaning.
arXiv Detail & Related papers (2024-10-18T17:36:53Z)
A + B: A General Generator-Reader Framework for Optimizing LLMs to Unleash Synergy Potential [20.1396255995056]
Retrieval-Augmented Generation (RAG) is an effective solution to supplement necessary knowledge to large language models (LLMs) "generate-then-read" pipeline is proposed to replace the retrieval stage with generation from the LLM itself. This paper formalizes a general "A + B" framework with varying combinations of foundation models and types for systematic investigation.
arXiv Detail & Related papers (2024-06-06T11:14:27Z)
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z)
Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data. We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs) We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z)
Instruction Fusion: Advancing Prompt Evolution through Hybridization [27.321629102942754]
This paper examines the constraints of existing prompt evolution techniques and introduces a novel approach, Instruction Fusion (IF) IF innovatively combines two distinct prompts through a hybridization process, thereby enhancing the evolution of training prompts for code LLMs. Our experimental results reveal that the proposed novel method effectively addresses the shortcomings of prior methods, significantly improving the performance of Code LLMs.
arXiv Detail & Related papers (2023-12-25T11:00:37Z)
UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models [22.457013726785295]
We present textbfUniGen, a textbfUnified textbfGenerative framework for retrieval and question answering. UniGen integrates both tasks into a single generative model leveraging the capabilities of large language models.
arXiv Detail & Related papers (2023-12-18T09:13:41Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.