Related papers: Cost-Aware Text-to-SQL: An Empirical Study of Cloud Compute Costs for LLM-Generated Queries

Cost-Aware Text-to-SQL: An Empirical Study of Cloud Compute Costs for LLM-Generated Queries

URL: http://arxiv.org/abs/2512.22364v1
Date: Fri, 26 Dec 2025 19:51:35 GMT
Title: Cost-Aware Text-to-SQL: An Empirical Study of Cloud Compute Costs for LLM-Generated Queries
Authors: Saurabh Deochake, Debajyoti Mukhopadhyay,
Abstract summary: Text-to- systems powered by Large Language Models (LLMs) achieve high accuracy on standard benchmarks.<n>Existing efficiency metrics such as the Valid Efficiency Score (VES) measure execution time rather than the consumption-based costs of cloud data warehouses.<n>We evaluate six state-of-the-art LLMs across 180 query executions on Google BigQuery using the StackOverflow dataset (230GB), measuring bytes processed, slot utilization, and estimated cost.
Score: 0.2578242050187029
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Text-to-SQL systems powered by Large Language Models (LLMs) achieve high accuracy on standard benchmarks, yet existing efficiency metrics such as the Valid Efficiency Score (VES) measure execution time rather than the consumption-based costs of cloud data warehouses. This paper presents the first systematic evaluation of cloud compute costs for LLM-generated SQL queries. We evaluate six state-of-the-art LLMs across 180 query executions on Google BigQuery using the StackOverflow dataset (230GB), measuring bytes processed, slot utilization, and estimated cost. Our analysis yields three key findings: (1) reasoning models process 44.5% fewer bytes than standard models while maintaining equivalent correctness (96.7%-100%); (2) execution time correlates weakly with query cost (r=0.16), indicating that speed optimization does not imply cost optimization; and (3) models exhibit up to 3.4x cost variance, with standard models producing outliers exceeding 36GB per query. We identify prevalent inefficiency patterns including missing partition filters and unnecessary full-table scans, and provide deployment guidelines for cost-sensitive enterprise environments.

Related papers

Making Databases Faster with LLM Evolutionary Sampling [27.62392938968789]
Traditional query optimization relies on cost-based models that estimate execution cost.<n>We use our DBPlan harness for the DataFusion engine to propose localized edits that can be applied and executed.<n>We then apply an evolutionary search over these edits to refine candidates across iterations.<n>We obtain up to 4.78$times$ speedups on some queries and we demonstrate a small-to-large workflow.
arXiv Detail & Related papers (2026-02-11T00:21:51Z)
Semantic Caching and Intent-Driven Context Optimization for Multi-Agent Natural Language to Code Systems [0.0]
We present a production-optimized multi-agent system designed to translate natural language queries into executable Python code for structured data analytics.<n>Unlike systems that rely on expensive frontier models, our approach achieves high accuracy and cost efficiency through three key innovations.<n>We describe the architecture, present empirical results from production deployment, and discuss practical considerations for deploying LLM-based analytics systems at scale.
arXiv Detail & Related papers (2026-01-16T11:32:20Z)
Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data [57.996437077411315]
We study the reasoning behavior of large language models (LLMs) under limited computation budgets.<n>We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase.<n> Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models.
arXiv Detail & Related papers (2026-01-16T07:09:30Z)
Cortex AISQL: A Production SQL Engine for Unstructured Data [11.480345698642006]
AI is deployed in production at Snowflake, where it powers diverse customer workloads across analytics, search, and content understanding.<n>We show how AI-aware query optimization treats AI inference cost as a first-class optimization objective.<n>Second, adaptive model cascades reduce inference costs by routing most rows through a fast proxy model.<n>Third, semantic join query rewriting lowers the quadratic time complexity of join operations to linear.
arXiv Detail & Related papers (2025-11-10T22:14:13Z)
EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z)
Cost-Optimal Grouped-Query Attention for Long-Context Modeling [45.981681856747365]
Grouped-Query Attention (GQA) is a widely adopted strategy for reducing the computational cost of attention layers in large language models.<n>We analyze the relationship among context length, model size, GQA configuration, and model loss.<n>We propose a recipe for deriving cost-optimal GQA configurations.
arXiv Detail & Related papers (2025-03-12T17:50:42Z)
CHESS: Contextual Harnessing for Efficient SQL Synthesis [1.9506402593665235]
We introduce CHESS, a framework for efficient and scalable text-to- queries. It comprises four specialized agents, each targeting one of the aforementioned challenges. Our framework offers features that adapt to various deployment constraints.
arXiv Detail & Related papers (2024-05-27T01:54:16Z)
Budget-aware Query Tuning: An AutoML Perspective [14.561951257365953]
Modern database systems rely on cost-based querys to come up with good execution plans for input queries. We show that by varying the costunit values one can obtain query plans that significantly outperform the default query plans.
arXiv Detail & Related papers (2024-03-29T20:19:36Z)
Optimizing LLM Queries in Relational Data Analytics Workloads [50.95919232839785]
Batch data analytics is a growing application for Large Language Models (LLMs)<n>LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets.<n>We propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads.
arXiv Detail & Related papers (2024-03-09T07:01:44Z)
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning [58.71541261221863]
Join order selection (JOS) is the problem of ordering join operations to minimize total query execution cost. We present JoinGym, a query optimization environment for bushy reinforcement learning (RL) Under the hood, JoinGym simulates a query plan's cost by looking up intermediate result cardinalities from a pre-computed dataset.
arXiv Detail & Related papers (2023-07-21T17:00:06Z)
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs [60.58434523646137]
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency. We introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question. Our experiments show that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.
arXiv Detail & Related papers (2023-05-19T17:49:25Z)
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs [66.30706841821123]
Large language models (LLMs) power many state-of-the-art systems in natural language processing. LLMs are extremely computationally expensive, even at inference time. We propose a new metric for comparing inference efficiency across models.
arXiv Detail & Related papers (2023-05-03T21:51:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.