Xpose: Bi-directional Engineering for Hidden Query Extraction
- URL: http://arxiv.org/abs/2504.10898v1
- Date: Tue, 15 Apr 2025 06:17:58 GMT
- Title: Xpose: Bi-directional Engineering for Hidden Query Extraction
- Authors: Ahana Pradhan, Jayant Haritsa,
- Abstract summary: Hidden Query Extraction (HQE) has a spectrum of industrial use-cases including query recovery, database security, and vendor migration.<n>The reverse engineering (RE) tools developed for HQE, which are based on database mutation and generation techniques, can only extract flat queries with key-based equi joins and conjunctive arithmetic filter predicates.<n>We present Xpose, a HQE solution that elevates the extraction scope to realistic complex queries, such as those found in the TPCH benchmark.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query reverse engineering (QRE) aims to synthesize a SQL query to connect a given database and result instance. A recent variation of QRE is where an additional input, an opaque executable containing a ground-truth query, is provided, and the goal is to non-invasively extract this specific query through only input-output examples. This variant, called Hidden Query Extraction (HQE), has a spectrum of industrial use-cases including query recovery, database security, and vendor migration. The reverse engineering (RE) tools developed for HQE, which are based on database mutation and generation techniques, can only extract flat queries with key-based equi joins and conjunctive arithmetic filter predicates, making them limited wrt both query structure and query operators. In this paper, we present Xpose, a HQE solution that elevates the extraction scope to realistic complex queries, such as those found in the TPCH benchmark. A two-pronged approach is taken: (1) The existing RE scope is substantially extended to incorporate union connectors, algebraic filter predicates, and disjunctions for both values and predicates. (2) The predictive power of LLMs is leveraged to convert business descriptions of the opaque application into extraction guidance, representing ``forward engineering" (FE). The FE module recognizes common constructs, such as nesting of sub-queries, outer joins, and scalar functions. In essence, FE establishes the broad query contours, while RE fleshes out the fine-grained details. We have evaluated Xpose on (a) E-TPCH, a query suite comprising the complete TPCH benchmark extended with queries featuring unions, diverse join types, and sub-queries; and (b) the real-world STACK benchmark. The experimental results demonstrate that its bi-directional engineering approach accurately extracts these complex queries, representing a significant step forward with regard to HQE coverage.
Related papers
- RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF for Conversational QA over KGs with RAG [6.4032082023113475]
SPARQL is brittle for complex intents and conversational questions.<n>We propose a novel two-pronged system where we fuse: (i) SPARQL results over a database automatically derived from the knowledge graph, and (ii) text-search results over verbalizations of KG facts.<n>Our pipeline supports iterative retrieval: when the results of any branch are found to be unsatisfactory, the system can automatically opt for further rounds.<n>We demonstrate the superiority of our proposed system over several baselines on a knowledge graph of BMW automobiles.
arXiv Detail & Related papers (2024-12-23T16:16:30Z) - mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA [78.45521005703958]
multimodal Retrieval-Augmented Generation (mRAG) is naturally introduced to provide MLLMs with comprehensive and up-to-date knowledge.
We propose a novel framework called textbfRetrieval-textbfReftextbfAugmented textbfGeneration (mR$2$AG) which achieves adaptive retrieval and useful information localization.
mR$2$AG significantly outperforms state-of-the-art MLLMs on INFOSEEK and Encyclopedic-VQA
arXiv Detail & Related papers (2024-11-22T16:15:50Z) - DAGE: DAG Query Answering via Relational Combinator with Logical Constraints [24.60431781360608]
We propose a query embedding method for DAG queries called DAGE.
DAGE combines the possibly multiple paths between two nodes into a single path with a trainable operator.
We show that it is possible to implement DAGE on top of existing query embedding methods.
arXiv Detail & Related papers (2024-10-29T15:02:48Z) - Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Is the House Ready For Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering [48.43453390717167]
We present and tackle the problem of Embodied Question Answering with Situational Queries (S-EQA) in a household environment.<n>Unlike prior EQA work, situational queries require the agent to correctly identify multiple object-states and reach a consensus on their states for an answer.<n>We introduce a novel Prompt-Generate-Evaluate scheme that wraps around an LLM's output to generate unique situational queries and corresponding consensus object information.
arXiv Detail & Related papers (2024-05-08T00:45:20Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - RQFormer: Rotated Query Transformer for End-to-End Oriented Object Detection [26.37802649901314]
Oriented object detection presents a challenging task due to the presence of object instances with multiple orientations, varying scales, and dense distributions.
We propose an end-to-end oriented detector called the Rotated Query Transformer, which integrates two key technologies.
Experiments conducted on four remote sensing datasets and one scene text dataset demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-11-29T13:43:17Z) - Semantic Decomposition of Question and SQL for Text-to-SQL Parsing [2.684900573255764]
We propose a new modular Query Plan Language (QPL) that systematically decomposessql queries into simple and regular sub-queries.
Experimental results demonstrate that QPL is more effective than text-to-QPL for semantically equivalent queries.
arXiv Detail & Related papers (2023-10-20T15:13:34Z) - Optimizing Test-Time Query Representations for Dense Retrieval [34.61821330771046]
TOUR improves query representations guided by test-time retrieval results.
We leverage a cross-encoder re-ranker to provide fine-grained pseudo labels over retrieval results.
TOUR consistently improves direct re-ranking by up to 2.0% while running 1.3-2.4x faster.
arXiv Detail & Related papers (2022-05-25T11:39:42Z) - Query2Particles: Knowledge Graph Reasoning with Particle Embeddings [49.64006979045662]
We propose a query embedding method to answer complex logical queries on knowledge graphs with missing edges.
The answer entities are selected according to the similarities between the entity embeddings and the query embedding.
A complex KG query answering method, Q2P, is proposed to retrieve diverse answers from different areas over the embedding space.
arXiv Detail & Related papers (2022-04-27T11:16:08Z) - Learning Query Expansion over the Nearest Neighbor Graph [94.80212602202518]
Graph Query Expansion (GQE) is presented, which is learned in a supervised manner and performs aggregation over an extended neighborhood of the query.
The technique achieves state-of-the-art results over known benchmarks.
arXiv Detail & Related papers (2021-12-05T19:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.