Enhancing Webshell Detection With Deep Learning-Powered Methods
- URL: http://arxiv.org/abs/2412.05532v1
- Date: Sat, 07 Dec 2024 04:26:36 GMT
- Title: Enhancing Webshell Detection With Deep Learning-Powered Methods
- Authors: Ha L. Viet, On V. Phung, Hoa N. Nguyen,
- Abstract summary: Webshell attacks are becoming more common, requiring robust detection mechanisms to protect web applications.<n>The dissertation proposes ASAF, an advanced DL-Powered Source-Code Scanning Framework that uses signature-based methods and deep learning algorithms to detect known and unknown webshells.<n>Second, the dissertation introduces a deep neural network that detects webshells using real-time HTTP traffic analysis of web applications.
- Score: 0.6390468088226495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Webshell attacks are becoming more common, requiring robust detection mechanisms to protect web applications. The dissertation clearly states two research directions: scanning web application source code and analyzing HTTP traffic to detect webshells. First, the dissertation proposes ASAF, an advanced DL-Powered Source-Code Scanning Framework that uses signature-based methods and deep learning algorithms to detect known and unknown webshells. We designed the framework to enable programming language-specific detection models. The dissertation used PHP for interpreted language and ASP.NET for compiled language to build a complete ASAF-based model for experimentation and comparison with other research results to prove its efficacy. Second, the dissertation introduces a deep neural network that detects webshells using real-time HTTP traffic analysis of web applications. The study proposes an algorithm to improve the deep learning model's loss function to address data imbalance. We tested and compared the model to other studies on the CSE-CIC-IDS2018 dataset to prove its efficacy. We integrated the model with NetIDPS to improve webshell identification. Automatically blacklist attack source IPs and block URIs querying webshells on the web server to prevent these attacks.
Related papers
- WebThinker: Empowering Large Reasoning Models with Deep Research Capability [60.81964498221952]
WebThinker is a deep research agent that empowers large reasoning models to autonomously search the web, navigate web pages, and draft research reports during the reasoning process.
It also employs an textbfAutonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time.
Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.
arXiv Detail & Related papers (2025-04-30T16:25:25Z) - Poster: Long PHP webshell files detection based on sliding window attention [7.20974772731121]
We first convert PHP source code to opcodes and then extract Opcode Double-Tuples (ODTs)
To address the challenge that deep learning methods have difficulty detecting long webshell files, we introduce a sliding window attention mechanism.
Experimental results show that our method reaches high accuracy in webshell detection.
arXiv Detail & Related papers (2025-02-26T16:04:17Z) - AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts.
Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website.
We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z) - Large Language Models are Few-shot Generators: Proposing Hybrid Prompt Algorithm To Generate Webshell Escape Samples [1.6223257916285212]
We propose the Hybrid Prompt algorithm for webshell escape sample generation with the help of large language models.
As a prompt algorithm specifically developed for webshell sample generation, the Hybrid Prompt algorithm not only combines various prompt ideas including Chain of Thought, Tree of Thought, but also incorporates various components such as webshell hierarchical module.
Experimental results show that the Hybrid Prompt algorithm can work with multiple LLMs with excellent code reasoning ability to generate high-quality webshell samples.
arXiv Detail & Related papers (2024-02-12T04:59:58Z) - Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence.
We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers.
Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Detecting Cloud-Based Phishing Attacks by Combining Deep Learning Models [0.0]
Web-based phishing attacks nowadays exploit popular cloud web hosting services and apps such as Google Sites and Typeform for hosting their attacks.
Here we investigate the effectiveness of deep learning models in detecting this class of cloud-based phishing attacks.
arXiv Detail & Related papers (2022-04-05T18:47:57Z) - A Deep Learning Approach for Ontology Enrichment from Unstructured Text [2.932750332087746]
Existing information vulnerabilities on attacks, controls, and advisories available on the web provide an opportunity to represent and perform security analytics.
Ontology enrichment algorithms based on natural language processing and ML models have issues with contextual extraction of concepts in words, phrases, and sentences.
Bidirectional LSTMs trained on a large DB dataset and Wikipedia corpus of 2.8 GB along with Universal Sentence is deployed to enrich ISO-based information security.
arXiv Detail & Related papers (2021-12-16T01:32:21Z) - OntoEnricher: A Deep Learning Approach for Ontology Enrichment from
Unstructured Text [2.707154152696381]
Existing information on vulnerabilities, controls, and advisories available on the web provides an opportunity to represent knowledge and perform analytics to mitigate some of the concerns.
This necessitates dynamic and automated enrichment of information security.
Existing ontology enrichment algorithms based on natural processing and ML models have issues with the contextual extraction of concepts in words, phrases and sentences.
arXiv Detail & Related papers (2021-02-08T09:43:05Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.