Optimizing LLM Queries in Relational Workloads
- URL: http://arxiv.org/abs/2403.05821v1
- Date: Sat, 9 Mar 2024 07:01:44 GMT
- Title: Optimizing LLM Queries in Relational Workloads
- Authors: Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E.
Gonzalez, Ion Stoica, Matei Zaharia
- Abstract summary: We show how to optimize Large Language Models (LLMs) inference for analytical workloads that invoke LLMs within relational queries.
We implement these optimizations in Apache Spark, with vLLM as the model serving backend.
We achieve up to 4.4x improvement in end-to-end latency on a benchmark of diverse LLM-based queries on real datasets.
- Score: 58.254894049950366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analytical database providers (e.g., Redshift, Databricks, BigQuery) have
rapidly added support for invoking Large Language Models (LLMs) through native
user-defined functions (UDFs) to help users perform natural language tasks,
such as classification, entity extraction, and translation, inside analytical
workloads. For instance, an analyst might want to extract customer sentiments
on millions of product reviews. However, LLM inference is highly expensive in
both computational and economic terms: for example, an NVIDIA L4 GPU running
Llama2-7B can only process 6 KB of text per second. In this paper, we explore
how to optimize LLM inference for analytical workloads that invoke LLMs within
relational queries. We show that relational queries present novel opportunities
for accelerating LLM inference, including reordering rows to maximize key-value
(KV) cache reuse within the LLM inference engine, reordering columns within a
row to further increase cache reuse, and deduplicating redundant inference
requests. We implement these optimizations in Apache Spark, with vLLM as the
model serving backend and achieve up to 4.4x improvement in end-to-end latency
on a benchmark of diverse LLM-based queries on real datasets. To the best of
our knowledge, this is the first work to explicitly address the problem of
optimizing LLM invocations within SQL queries.
Related papers
- Relational Database Augmented Large Language Model [59.38841050766026]
Large language models (LLMs) excel in many natural language processing (NLP) tasks.
They can only incorporate new knowledge through training or supervised fine-tuning processes.
This precise, up-to-date, and private information is typically stored in relational databases.
arXiv Detail & Related papers (2024-07-21T06:19:10Z) - CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search [67.6104548484555]
We introduce CHIQ, a two-step method that leverages the capabilities of open-source large language models (LLMs) to resolve ambiguities in the conversation history before query rewriting.
We demonstrate on five well-established benchmarks that CHIQ leads to state-of-the-art results across most settings.
arXiv Detail & Related papers (2024-06-07T15:23:53Z) - Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations [2.699900017799093]
We focus on fine tuning LLaMA, an open source LLM using proprietary documents and code from an enterprise repository.
As part of this work, we aim to guide beginners on how to start with fine tuning an LLM for documentation and code.
We also propose pre processing recipes for both documentation and code to prepare dataset in different formats.
arXiv Detail & Related papers (2024-03-23T13:25:01Z) - Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization [7.674972936853123]
We investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization.
We observe that 100% reliability in generating the response in the expected format is usually limited to certain closed-source LLMs.
arXiv Detail & Related papers (2024-02-29T19:00:47Z) - LLatrieval: LLM-Verified Retrieval for Verifiable Generation [67.93134176912477]
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents.
We propose LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question.
Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-11-14T01:38:02Z) - SEED: Domain-Specific Data Curation With Large Language Models [22.54280367957015]
We present SEED, an LLM-as-compiler approach that automatically generates domain-specific data curation solutions via Large Language Models (LLMs)
SEED features an that automatically selects from the four LLM-assisted modules and forms a hybrid execution pipeline that best fits the task at hand.
arXiv Detail & Related papers (2023-10-01T17:59:20Z) - Query Rewriting for Retrieval-Augmented Large Language Models [139.242907155883]
Large Language Models (LLMs) play powerful, black-box readers in the retrieve-then-read pipeline.
This work introduces a new framework, Rewrite-Retrieve-Read instead of the previous retrieve-then-read for the retrieval-augmented LLMs.
arXiv Detail & Related papers (2023-05-23T17:27:50Z) - Large Language Models are Strong Zero-Shot Retriever [89.16756291653371]
We propose a simple method that applies a large language model (LLM) to large-scale retrieval in zero-shot scenarios.
Our method, the Language language model as Retriever (LameR), is built upon no other neural models but an LLM.
arXiv Detail & Related papers (2023-04-27T14:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.