DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
- URL: http://arxiv.org/abs/2402.01117v1
- Date: Fri, 2 Feb 2024 03:21:00 GMT
- Title: DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
- Authors: Mohammadreza Pourreza and Davood Rafiei
- Abstract summary: We introduce a novel two-stage fine-tuning approach that decomposes the task into two simpler tasks.
We show that this approach improves execution accuracy by 3 to 7 percent.
- Score: 7.388002745070808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leading models for the text-to-SQL task heavily rely on proprietary Large
Language Models (LLMs), posing concerns over data privacy. Closing the
performance gap between small open-source models and large proprietary models
is crucial to mitigate this reliance. To this end, we introduce a novel
two-stage fine-tuning approach that decomposes the task into two simpler tasks.
Through comprehensive evaluation on two large cross-domain datasets and two
small LLMs, we show that this approach improves execution accuracy by 3 to 7
percent, effectively aligning the performance of open-source models with their
proprietary counterparts.
Related papers
- Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization [65.64108848398696]
We introduce a preference optimization process to enhance the multimodal reasoning capabilities of MLLMs.
We develop a simple yet effective method, termed Mixed Preference Optimization (MPO), which boosts multimodal CoT performance.
Our model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista, outperforming InternVL2-8B by 8.7 points and achieving performance comparable to the 10x larger InternVL2-76B.
arXiv Detail & Related papers (2024-11-15T18:59:27Z) - Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model [16.03304915788997]
Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract entities and their relations from text-image pairs in social media posts.
Existing methods for JMERE require large amounts of labeled data.
We introduce the textbfKnowledge-textbfEnhanced textbfCross-modal textbfPrompt textbfModel.
arXiv Detail & Related papers (2024-10-18T07:14:54Z) - Synthesizing Text-to-SQL Data from Weak and Strong LLMs [68.69270834311259]
The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to- tasks.
We introduce a synthetic data approach that combines data produced by larger, more powerful models with error information data generated by smaller, not well-aligned models.
arXiv Detail & Related papers (2024-08-06T15:40:32Z) - Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation [1.8876415010297893]
Data-to-text (D2T) generation aims to generate human-readable text from semi-structured data, such as tables and graphs.
No research has been conducted to illustrate the impact of model size on the performance of fine-tuned LLMs for D2T tasks.
We aim to elucidate both the advantages and limitations of scaling model sizes across five widely used D2T datasets.
arXiv Detail & Related papers (2024-07-19T07:54:30Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale [66.01943465390548]
We introduce TriSum, a framework for distilling large language models' text summarization abilities into a compact, local model.
Our method enhances local model performance on various benchmarks.
It also improves interpretability by providing insights into the summarization rationale.
arXiv Detail & Related papers (2024-03-15T14:36:38Z) - Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm [19.06214756792692]
In-context learning of large-language models (LLMs) has achieved remarkable success in the field of natural language processing.
Case studies reveal that the single-step chain-of-thought approach faces challenges such as attention diffusion and inadequate performance in complex tasks like text-to-correction.
A workflow paradigm is proposed, aiming to enhance the attention and problem-solving scope of LLMs through decomposition.
arXiv Detail & Related papers (2024-02-16T13:24:05Z) - Open, Closed, or Small Language Models for Text Classification? [10.186568241388331]
We evaluate three classes of models using eight datasets across three distinct NLP tasks.
Open-source models can rival their closed-source counterparts by fine-tuning.
This study underscores the importance of model selection based on task requirements.
arXiv Detail & Related papers (2023-08-19T18:58:32Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - Learning Contextual Representations for Semantic Parsing with
Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.
Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.