Advancing SQL Injection Detection for High-Speed Data Centers: A Novel Approach Using Cascaded NLP
- URL: http://arxiv.org/abs/2312.13041v1
- Date: Wed, 20 Dec 2023 14:09:13 GMT
- Title: Advancing SQL Injection Detection for High-Speed Data Centers: A Novel Approach Using Cascaded NLP
- Authors: Kasim Tasdemir, Rafiullah Khan, Fahad Siddiqui, Sakir Sezer, Fatih Kurugollu, Sena Busra Yengec-Tasdemir, Alperen Bolat,
- Abstract summary: We introduce a novel cascade SQLi detection method, blending classical and transformer-based NLP models.
Our method achieves a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone.
- Score: 1.281563056333574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting SQL Injection (SQLi) attacks is crucial for web-based data center security, but it is challenging to balance accuracy and computational efficiency, especially in high-speed networks. Traditional methods struggle with this balance, while NLP-based approaches, although accurate, are computationally intensive. We introduce a novel cascade SQLi detection method, blending classical and transformer-based NLP models, achieving a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone. Our approach is tested in a realistic setting and compared with 35 other methods, including Machine Learning-based and transformer models like BERT, on a dataset of over 30,000 SQL sentences. Our results show that this hybrid method effectively detects SQLi in high-traffic environments, offering efficient and accurate protection against SQLi vulnerabilities with computational efficiency. The code is available at https://github.com/gdrlab/cascaded-sqli-detection .
Related papers
- Enhancing SQL Injection Detection and Prevention Using Generative Models [4.424836140281847]
This paper introduces an innovative approach that leverages generative models to enhance SQLi detection and prevention mechanisms.
By incorporating Variational Autoencoders (VAE), Conditional Wasserstein GAN with Gradient Penalty (CWGAN-GP), and U-Net, syntheticsql queries were generated to augment training datasets for machine learning models.
arXiv Detail & Related papers (2025-02-07T09:43:43Z) - Flow-based Detection of Botnets through Bio-inspired Optimisation of Machine Learning [0.5735035463793009]
Botnets could autonomously infect, propagate, communicate and coordinate with other members in the botnet.
Traditional detection methods are becoming increasingly unsuitable against various network-based detection evasion methods.
This research explores the application of network flow-based behavioural modelling to facilitate the binary classification of bot network activity.
arXiv Detail & Related papers (2024-12-07T15:55:49Z) - Learning from Imperfect Data: Towards Efficient Knowledge Distillation of Autoregressive Language Models for Text-to-SQL [83.99974309930072]
Knowledge distillation (KD) is a common approach, which aims to distill the larger teacher model into a smaller student model.
We propose to improve the KD with Imperfect Data, namely KID, which effectively boosts the performance without introducing much training budget.
KID can not only achieve consistent and significant performance gains across all model types and sizes, but also effectively improve the training efficiency.
arXiv Detail & Related papers (2024-10-15T07:51:00Z) - Context-Aware SQL Error Correction Using Few-Shot Learning -- A Novel Approach Based on NLQ, Error, and SQL Similarity [0.0]
This paper introduces a novel few-shot learning-based approach for error correction insql generation.
It enhances the accuracy of generated queries by selecting the most suitable few-shot error correction examples for a given natural language question (NLQ)
In experiments with the open-source dataset, the proposed model offers a 39.2% increase in fixing errors with no error correction and a 10% increase from a simple error correction method.
arXiv Detail & Related papers (2024-10-11T18:22:08Z) - Inferring Data Preconditions from Deep Learning Models for Trustworthy
Prediction in Deployment [25.527665632625627]
It is important to reason about the trustworthiness of the model's predictions with unseen data during deployment.
Existing methods for specifying and verifying traditional software are insufficient for this task.
We propose a novel technique that uses rules derived from neural network computations to infer data preconditions.
arXiv Detail & Related papers (2024-01-26T03:47:18Z) - Machine Learning Force Fields with Data Cost Aware Training [94.78998399180519]
Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation.
Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels.
We propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.
arXiv Detail & Related papers (2023-06-05T04:34:54Z) - Synergistic Self-supervised and Quantization Learning [24.382347077407303]
We propose a method called synergistic self-supervised and quantization learning (S) to pretrain quantization-friendly self-supervised models.
By only training once, S can then benefit various downstream tasks at different bit-widths simultaneously.
arXiv Detail & Related papers (2022-07-12T09:55:10Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - Bayesian Optimization with Machine Learning Algorithms Towards Anomaly
Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique.
The performance of the considered algorithms is evaluated using the ISCX 2012 dataset.
Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.