Domain Specific Question to SQL Conversion with Embedded Data Balancing Technique
- URL: http://arxiv.org/abs/2504.08753v1
- Date: Fri, 28 Mar 2025 08:58:14 GMT
- Title: Domain Specific Question to SQL Conversion with Embedded Data Balancing Technique
- Authors: Jyothi, T. Satyanarayana Murthy,
- Abstract summary: This paper proposes two intermediations to improve the accuracy of structured query language models.<n>The proposed solution achieved 10.98 percentage improvement in accuracy of the model performance compared to the state of the art model tested on Wiki dataset.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The rise of deep learning in natural language processing has fostered the creation of text to structured query language models composed of an encoder and a decoder. Researchers have experimented with various intermediate processing like schema linking, table type aware, value extract. To generate accurate SQL results for the user question. However error analysis performed on the failed cases on these systems shows, 29 percentage of the errors would be because the system was unable to understand the values expressed by the user in their question. This challenge affects the generation of accurate SQL queries, especially when dealing with domain-specific terms and specific value conditions, where traditional methods struggle to maintain consistency and precision. To overcome these obstacles, proposed two intermediations like implementing data balancing technique and over sampling domain-specific queries which would refine the model architecture to enhance value recognition and fine tuning the model for domain-specific questions. This proposed solution achieved 10.98 percentage improvement in accuracy of the model performance compared to the state of the art model tested on WikiSQL dataset. to convert the user question accurately to SQL queries. Applying oversampling technique on the domain-specific questions shown a significant improvement as compared with traditional approaches.
Related papers
- SQLCritic: Correcting Text-to-SQL Generation via Clause-wise Critic [0.8098097078441623]
We propose a novel approach combining structured execution feedback with a trained critic agent that provides detailed, interpretable critiques.<n>This method effectively identifies and corrects both syntactic and semantic errors, enhancing accuracy and interpretability.
arXiv Detail & Related papers (2025-03-11T02:52:39Z) - Confidence Estimation for Error Detection in Text-to-SQL Systems [5.636160825241556]
This study investigates the integration of selective classifiers into Text-to-learning systems.
We show that encoder-decoder T5 is better calibrated than in-context GPT 4 and decoder-only Llama 3.
In terms of error detection, selective classifier with a higher probability detects errors associated with irrelevant questions rather than incorrect query generations.
arXiv Detail & Related papers (2025-01-16T13:23:07Z) - Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - Context-Aware SQL Error Correction Using Few-Shot Learning -- A Novel Approach Based on NLQ, Error, and SQL Similarity [0.0]
This paper introduces a novel few-shot learning-based approach for error correction insql generation.
It enhances the accuracy of generated queries by selecting the most suitable few-shot error correction examples for a given natural language question (NLQ)
In experiments with the open-source dataset, the proposed model offers a 39.2% increase in fixing errors with no error correction and a 10% increase from a simple error correction method.
arXiv Detail & Related papers (2024-10-11T18:22:08Z) - DAC: Decomposed Automation Correction for Text-to-SQL [51.48239006107272]
We introduce De Automation Correction (DAC), which corrects text-to-composed by decomposing entity linking and skeleton parsing.
We show that our method improves performance by $3.7%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method.
arXiv Detail & Related papers (2024-08-16T14:43:15Z) - ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling [0.0]
We introduce an entropy-based method to identify and filter out unanswerable results.
We experimentally verified that our method can filter unanswerable questions, which can be widely utilized.
arXiv Detail & Related papers (2024-04-25T14:55:07Z) - TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring [11.78795632771211]
We introduce a novel benchmark designed to evaluate text-to- reliability as a model's ability to correctly handle any type of input question.
We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches.
arXiv Detail & Related papers (2024-03-23T16:12:52Z) - Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases.
We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems.
Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z) - Controllable Data Augmentation for Context-Dependent Text-to-SQL [46.11511797999039]
We introduce ConDA, which generates interactive questions and correspondingsql results.
We also present a filter method to ensure the data quality by a grounding model.
We analyze the augmented data, which reveals that the data generated by ConDA are of high quality.
arXiv Detail & Related papers (2023-04-27T01:00:10Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers [26.15889661083109]
We present KDBaggleQA, a new cross-domain evaluation dataset of real Web databases.
We show that KDBaggleQA presents a challenge to state-of-the-art zero-shots but that a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%.
arXiv Detail & Related papers (2021-06-22T00:08:03Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.