Reinforcement Learning for Optimizing RAG for Domain Chatbots
- URL: http://arxiv.org/abs/2401.06800v1
- Date: Wed, 10 Jan 2024 02:57:20 GMT
- Title: Reinforcement Learning for Optimizing RAG for Domain Chatbots
- Authors: Mandar Kulkarni, Praveen Tangarajan, Kyung Kim, Anusua Trivedi
- Abstract summary: This paper describes a RAG-based approach for building a bot that answers user's queries using Frequently Asked Questions (FAQ) data.
We train an in-house retrieval embedding model using infoNCE loss, and experimental results demonstrate that the in-house model works significantly better than the well-known general-purpose public embedding model.
We propose a policy-based model external to the RAG, which interacts with the RAG pipeline through policy actions and updates the policy to optimize the cost.
- Score: 4.12484724941528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the advent of Large Language Models (LLM), conversational assistants
have become prevalent for domain use cases. LLMs acquire the ability to
contextual question answering through training, and Retrieval Augmented
Generation (RAG) further enables the bot to answer domain-specific questions.
This paper describes a RAG-based approach for building a chatbot that answers
user's queries using Frequently Asked Questions (FAQ) data. We train an
in-house retrieval embedding model using infoNCE loss, and experimental results
demonstrate that the in-house model works significantly better than the
well-known general-purpose public embedding model, both in terms of retrieval
accuracy and Out-of-Domain (OOD) query detection. As an LLM, we use an open
API-based paid ChatGPT model. We noticed that a previously retrieved-context
could be used to generate an answer for specific patterns/sequences of queries
(e.g., follow-up queries). Hence, there is a scope to optimize the number of
LLM tokens and cost. Assuming a fixed retrieval model and an LLM, we optimize
the number of LLM tokens using Reinforcement Learning (RL). Specifically, we
propose a policy-based model external to the RAG, which interacts with the RAG
pipeline through policy actions and updates the policy to optimize the cost.
The policy model can perform two actions: to fetch FAQ context or skip
retrieval. We use the open API-based GPT-4 as the reward model. We then train a
policy model using policy gradient on multiple training chat sessions. As a
policy model, we experimented with a public gpt-2 model and an in-house BERT
model. With the proposed RL-based optimization combined with similarity
threshold, we are able to achieve significant cost savings while getting a
slightly improved accuracy. Though we demonstrate results for the FAQ chatbot,
the proposed RL approach is generic and can be experimented with any existing
RAG pipeline.
Related papers
- RARe: Retrieval Augmented Retrieval with In-Context Examples [40.963703726988946]
We introduce a simple approach to enable retrievers to use in-context examples.
RARE finetunes a pre-trained model with in-context examples whose query is semantically similar to the target query.
We find RARe exhibits stronger out-of-domain generalization compared to models using queries without in-context examples.
arXiv Detail & Related papers (2024-10-26T05:46:20Z) - ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities [53.97515452727115]
ChatQA 2 is a Llama 3.0-based model with a 128K context window.
We present a training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens.
Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-19T17:35:47Z) - Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries.
This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines.
We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - RAFT: Adapting Language Model to Domain Specific RAG [75.63623523051491]
We present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "openbook" in-domain settings.
RAFT accomplishes this by citing the verbatim right sequence from the relevant document that would help answer the question.
RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets.
arXiv Detail & Related papers (2024-03-15T09:26:02Z) - Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval
Augmented Generation Models for Open Book Question-Answering [0.0]
We propose a framework to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents.
The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2023-07-12T04:44:31Z) - Enhancing In-Context Learning with Answer Feedback for Multi-Span
Question Answering [9.158919909909146]
In this paper, we propose a novel way of employing labeled data such as it informs LLM of some undesired output.
Experiments on three multi-span question answering datasets and a keyphrase extraction dataset show that our new prompting strategy consistently improves LLM's in-context learning performance.
arXiv Detail & Related papers (2023-06-07T15:20:24Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios.
Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z) - Offline RL with No OOD Actions: In-Sample Learning via Implicit Value
Regularization [90.9780151608281]
In-sample learning (IQL) improves the policy by quantile regression using only data samples.
We make a key finding that the in-sample learning paradigm arises under the textitImplicit Value Regularization (IVR) framework.
We propose two practical algorithms, Sparse $Q$-learning (EQL) and Exponential $Q$-learning (EQL), which adopt the same value regularization used in existing works.
arXiv Detail & Related papers (2023-03-28T08:30:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.