Related papers: LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs

LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs

URL: http://arxiv.org/abs/2309.00841v1
Date: Sat, 2 Sep 2023 06:33:18 GMT
Title: LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs
Authors: Md Adnan Arefeen, Biplob Debnath, Srimat Chakradhar
Abstract summary: Question-answering (QA) is a significant application of Large Language Models (LLMs) In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts $k$ key sentences from the context that are closely aligned with the query.
Score: 1.9468358338146958
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LLM responses. One option is to summarize the context by using LLMs and reduce the context. However, this can also filter out useful information that is necessary to answer some domain-specific queries. In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts $k$ key sentences from the context that are closely aligned with the query. The choice of $k$ is neither static nor random; we introduce a reinforcement learning technique that dynamically determines $k$ based on the query and context. The rest of the less important sentences are reduced using a free open source text reduction method. We evaluate LeanContext against several recent query-aware and query-unaware context reduction approaches on prominent datasets (arxiv papers and BBC news articles). Despite cost reductions of $37.29\%$ to $67.81\%$, LeanContext's ROUGE-1 score decreases only by $1.41\%$ to $2.65\%$ compared to a baseline that retains the entire context (no summarization). Additionally, if free pretrained LLM-based summarizers are used to reduce context (into human consumable summaries), LeanContext can further modify the reduced context to enhance the accuracy (ROUGE-1 score) by $13.22\%$ to $24.61\%$.

Related papers

SAGE: A Framework of Precise Retrieval for RAG [9.889395372896153]
Retrieval-augmented generation (RAG) has demonstrated significant proficiency in conducting question-answering tasks. RAG methods segment the corpus without considering semantics, making it difficult to find relevant context. We introduce a RAG framework (SAGE) to overcome these limitations.
arXiv Detail & Related papers (2025-03-03T16:25:58Z)
Context-DPO: Aligning Language Models for Context-Faithfulness [80.62221491884353]
We propose the first alignment method specifically designed to enhance large language models' context-faithfulness. By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization. Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models.
arXiv Detail & Related papers (2024-12-18T04:08:18Z)
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval [55.63711219190506]
Large language models (LLMs) often struggle with posing the right search queries. We introduce $underlineLe$arning to $underlineRe$trieve by $underlineT$rying (LeReT) LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%.
arXiv Detail & Related papers (2024-10-30T17:02:54Z)
QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression [37.08536175557748]
In this paper, we introduce a novel Query-gUIded aTtention cOmpression (QUITO) method to filter useless information. Specifically, we take a trigger token to calculate the attention distribution of the context in response to the question. We evaluate the QUITO using two widely-used datasets, namely, NaturalQuestions and ASQA.
arXiv Detail & Related papers (2024-08-01T04:28:38Z)
Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities [30.1331670544648]
Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks. We propose $textitRefiner$, an end-to-end extract-and-restructure paradigm that operates in the post-retrieval process of RAG.
arXiv Detail & Related papers (2024-06-17T09:25:10Z)
LLoCO: Learning Long Contexts Offline [63.3458260335454]
We propose LLoCO, a novel approach to processing long contexts. LLoCO learns contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens.
arXiv Detail & Related papers (2024-04-11T17:57:22Z)
Optimizing LLM Queries in Relational Data Analytics Workloads [50.95919232839785]
Batch data analytics is a growing application for Large Language Models (LLMs) LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets. We propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads.
arXiv Detail & Related papers (2024-03-09T07:01:44Z)
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens [64.08660301017302]
There is currently a lack of a standardized benchmark to evaluate this long-context capability. $infty$Bench is the first benchmark featuring an average data length surpassing 100K tokens. The results indicate that existing long context LLMs still require significant advancements to effectively process 100K+ context.
arXiv Detail & Related papers (2024-02-21T11:30:29Z)
On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z)
BooookScore: A systematic exploration of book-length summarization in the era of LLMs [53.42917858142565]
We develop an automatic metric, BooookScore, that measures the proportion of sentences in a summary that do not contain any of the identified error types. We find that closed-source LLMs such as GPT-4 and 2 produce summaries with higher BooookScore than those generated by open-source models.
arXiv Detail & Related papers (2023-10-01T20:46:44Z)
You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model. We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.