Related papers: Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

URL: http://arxiv.org/abs/2406.17574v1
Date: Tue, 25 Jun 2024 14:14:35 GMT
Title: Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats
Authors: Ryan Pavlich, Nima Ebadi, Richard Tarbell, Billy Linares, Adrian Tan, Rachael Humphreys, Jayanta Kumar Das, Rambod Ghandiparsi, Hannah Haley, Jerris George, Rocky Slavin, Kim-Kwang Raymond Choo, Glenn Dietrich, Anthony Rios,
Abstract summary: We introduce a novel text-to-IoT dataset comprising 10,985 text-to-IoT pairs and 239,398 rows of network traffic activity. Our results show that joint training to query and infer information about the data can improve overall text-to-IoT performance.
Score: 24.62442027542548
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. While substantial progress has been made in this field, existing research has concentrated on generating SQL statements from text queries. The broader challenge, however, lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably temporal-related queries. Our dataset is sourced from a smart building's IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data can improve overall text-to-SQL performance, nearly matching substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data, thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs.

Related papers

Text-to-SQL as Dual-State Reasoning: Integrating Adaptive Context and Progressive Generation [54.53145282349042]
We introduce DSR-sourced, a textbfDual-textbfS textbfReasoning framework that models Text-to-context as an interaction between an adaptive context state and a progressive generation state.<n>Without any post-training or in-context examples, DSR-sourced achieves competitive performance, reaching 35.28% execution accuracy on Spider 2.0-Snow and 68.32% on BIRD development set.
arXiv Detail & Related papers (2025-11-26T13:52:50Z)
Text2VectorSQL: Towards a Unified Interface for Vector Search and SQL Queries [36.92547259037192]
The proliferation of unstructured data poses a fundamental challenge to traditional database infrastructure.<n>While Text-to-BIRD has democratized access to structured data, it remains incapable of interpreting semantic or multi-modal queries.<n>We introduce and formalize Text2 Vector, a novel task to establish a unified natural language for seamlessly querying both structured and unstructured data.
arXiv Detail & Related papers (2025-06-29T03:17:42Z)
Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries. To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset) We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z)
A Survey of Large Language Model-Based Generative AI for Text-to-SQL: Benchmarks, Applications, Use Cases, and Challenges [0.7889270818022226]
Text-to-one systems facilitate smooth interaction with databases by translating natural language queries into Structured Query Language (technical) This survey provides an overview of the evolution of AI-driven text-to-one systems. We examine the applications of text-to-one in domains like healthcare, education, and finance.
arXiv Detail & Related papers (2024-12-06T17:36:28Z)
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going? [32.84561352339466]
We provide a review of Text-to- translation techniques powered by Large Language Models (LLMs)<n>We discuss the research challenges and open problems of Text-to- evaluation in the LLMs era.
arXiv Detail & Related papers (2024-08-09T14:59:36Z)
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [15.75829309721909]
Generating accuratesql from natural language questions (text-to-) is a long-standing challenge. PLMs have been developed and utilized for text-to- tasks, achieving promising performance. Recently, large language models (LLMs) have demonstrated significant capabilities in natural language understanding.
arXiv Detail & Related papers (2024-06-12T17:13:17Z)
Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries [4.141402725050671]
This paper is the first in-depth evaluation of the data model robustness of Text-to-- systems in practice. It is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models.
arXiv Detail & Related papers (2024-02-13T10:28:57Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing [55.10009651476589]
Speech-to-Spider (S2Spider) aims to convert spoken questions intosql queries given databases. We propose the first direct speech-to-speaker parsing model Wav2 which avoids error compounding across cascaded systems. Experimental results demonstrate that Wav2 avoids error compounding and achieves state-of-the-art results by up to 2.5% accuracy improvement over the baseline.
arXiv Detail & Related papers (2023-05-21T19:26:46Z)
Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play [46.07002748587857]
We explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new interactions. We find that self-play improves the accuracy of a strong baseline on SParC and Co, two widely used text-to-domain datasets.
arXiv Detail & Related papers (2022-10-21T16:40:07Z)
STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing [64.80483736666123]
We propose a novel pre-training framework STAR for context-dependent text-to- parsing. In addition, we construct a large-scale context-dependent text-to-the-art conversation corpus to pre-train STAR. Extensive experiments show that STAR achieves new state-of-the-art performance on two downstream benchmarks.
arXiv Detail & Related papers (2022-10-21T11:30:07Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.