Semantic Caching and Intent-Driven Context Optimization for Multi-Agent Natural Language to Code Systems
- URL: http://arxiv.org/abs/2601.11687v1
- Date: Fri, 16 Jan 2026 11:32:20 GMT
- Title: Semantic Caching and Intent-Driven Context Optimization for Multi-Agent Natural Language to Code Systems
- Authors: Harmohit Singh,
- Abstract summary: We present a production-optimized multi-agent system designed to translate natural language queries into executable Python code for structured data analytics.<n>Unlike systems that rely on expensive frontier models, our approach achieves high accuracy and cost efficiency through three key innovations.<n>We describe the architecture, present empirical results from production deployment, and discuss practical considerations for deploying LLM-based analytics systems at scale.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a production-optimized multi-agent system designed to translate natural language queries into executable Python code for structured data analytics. Unlike systems that rely on expensive frontier models, our approach achieves high accuracy and cost efficiency through three key innovations: (1) a semantic caching system with LLM-based equivalence detection and structured adaptation hints that provides cache hit rates of 67% on production queries; (2) a dual-threshold decision mechanism that separates exact-match retrieval from reference-guided generation; and (3) an intent-driven dynamic prompt assembly system that reduces token consumption by 40-60% through table-aware context filtering. The system has been deployed in production for enterprise inventory management, processing over 10,000 queries with an average latency of 8.2 seconds and 94.3% semantic accuracy. We describe the architecture, present empirical results from production deployment, and discuss practical considerations for deploying LLM-based analytics systems at scale.
Related papers
- MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering [54.236614097082395]
We introduce MEnvAgent, a framework for automated Environment construction.<n>MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures.<n>MEnvData-SWE is the largest open-source polyglot dataset of realistic verifiable Docker environments to date.
arXiv Detail & Related papers (2026-01-30T11:36:10Z) - Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages [61.18573330164572]
System prompts provide a lightweight yet powerful mechanism for conditioning large language models (LLMs) at inference time.<n>This paper presents a comprehensive study of how different system prompts steer models toward accurate and robust cross-lingual behavior.
arXiv Detail & Related papers (2025-12-02T14:54:54Z) - Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization [35.60979104539273]
Nirvana is a multi-modal data analytics framework that incorporates programmable semantic operators.<n>Nirvana is able to reduce end-to-end runtime by 10%--85% and reduces system processing costs by 76% on average.
arXiv Detail & Related papers (2025-11-25T01:41:49Z) - RAG-Driven Data Quality Governance for Enterprise ERP Systems [0.0]
We present an end-to-end pipeline combining automated data cleaning with LLM-driven query generation.<n>The system is deployed on a production system managing 240,000 employee records over six months.<n>This modular architecture provides a reproducible framework for AI-native enterprise data governance.
arXiv Detail & Related papers (2025-11-18T12:08:44Z) - MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning [82.14973479594367]
Large Language Models (LLMs) for complex reasoning tasks require innovative approaches that bridge intuitive and deliberate cognitive processes.<n>This paper introduces a Multi-Agent System for Deep ReSearch (MARS) enabling seamless integration of System 1's fast, intuitive thinking with System 2's deliberate reasoning.
arXiv Detail & Related papers (2025-10-06T15:42:55Z) - AutoMaAS: Self-Evolving Multi-Agent Architecture Search for Large Language Models [4.720605681761044]
AutoMaAS is a self-evolving multi-agent architecture search framework.<n>It uses neural architecture search principles to automatically discover optimal agent configurations.<n>It achieves 1.0-7.1% performance improvement and reduces inference costs by 3-5% compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-10-03T01:57:07Z) - Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search [54.987957691350665]
Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query.<n>Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications.<n>We propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search.
arXiv Detail & Related papers (2025-08-28T08:51:51Z) - RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory [57.449129198822476]
RCR is a role-aware context routing framework for multi-agent large language model (LLM) systems.<n>It dynamically selects semantically relevant memory subsets for each agent based on its role and task stage.<n>A lightweight scoring policy guides memory selection, and agent outputs are integrated into a shared memory store.
arXiv Detail & Related papers (2025-08-06T21:59:34Z) - QUPID: Quantified Understanding for Enhanced Performance, Insights, and Decisions in Korean Search Engines [4.94507535566914]
We show that combining two distinct small language models (SLMs) with different architectures can outperform large language models (LLMs) in relevance assessment.<n>Our approach -- QUPID -- integrates a generative SLM with an embedding-based SLM, achieving higher relevance judgment accuracy.
arXiv Detail & Related papers (2025-05-12T08:35:09Z) - ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.<n>This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.<n>Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs [29.735465300269993]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they often struggle with spatial reasoning.<n>This paper presents a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities through iterative feedback between LLMs and Answer Set Programming (ASP)<n>We evaluate our approach on two benchmark datasets: StepGame and SparQA.
arXiv Detail & Related papers (2024-11-27T18:04:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.