Related papers: Locality-Sensitive Hashing for Efficient Web Application Security Testing

Locality-Sensitive Hashing for Efficient Web Application Security Testing

URL: http://arxiv.org/abs/2001.01128v1
Date: Sat, 4 Jan 2020 21:05:15 GMT
Title: Locality-Sensitive Hashing for Efficient Web Application Security Testing
Authors: Ilan Ben-Bassat and Erez Rokah
Abstract summary: We present a novel approach to detect redundant content for security testing purposes. The algorithm applies locality-sensitive hashing using MinHash sketches in order to analyze the Document Object Model (DOM) structure of web pages. Our experimental results show that this approach allows a successful scan of RIAs that cannot be crawled otherwise.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Web application security has become a major concern in recent years, as more and more content and services are available online. A useful method for identifying security vulnerabilities is black-box testing, which relies on an automated crawling of web applications. However, crawling Rich Internet Applications (RIAs) is a very challenging task. One of the key obstacles crawlers face is the state similarity problem: how to determine if two client-side states are equivalent. As current methods do not completely solve this problem, a successful scan of many real-world RIAs is still not possible. We present a novel approach to detect redundant content for security testing purposes. The algorithm applies locality-sensitive hashing using MinHash sketches in order to analyze the Document Object Model (DOM) structure of web pages, and to efficiently estimate similarity between them. Our experimental results show that this approach allows a successful scan of RIAs that cannot be crawled otherwise.

Related papers

Enhancing Webshell Detection With Deep Learning-Powered Methods [0.6390468088226495]
Webshell attacks are becoming more common, requiring robust detection mechanisms to protect web applications. The dissertation proposes ASAF, an advanced DL-Powered Source-Code Scanning Framework that uses signature-based methods and deep learning algorithms to detect known and unknown webshells. Second, the dissertation introduces a deep neural network that detects webshells using real-time HTTP traffic analysis of web applications.
arXiv Detail & Related papers (2024-12-07T04:26:36Z)
Beyond Browsing: API-Based Web Agents [58.39129004543844]
API-based agents outperform web browsing agents in experiments on WebArena. Hybrid Agents out-perform both others nearly uniformly across tasks. Results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.
arXiv Detail & Related papers (2024-10-21T19:46:06Z)
WebAssembly and Security: a review [0.8962460460173961]
We analyze 121 papers by identifying seven different security categories. We aim to fill this gap by proposing a comprehensive review of research works dealing with security in WebAssembly.
arXiv Detail & Related papers (2024-07-17T03:37:28Z)
CRATOR: a Dark Web Crawler [1.7224362150588657]
This study proposes a general dark web crawler designed to extract pages handling security protocols, such as captchas. Our approach uses a combination of seed URL lists, link analysis, and scanning to discover new content.
arXiv Detail & Related papers (2024-05-10T09:39:12Z)
Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models [16.89878267176532]
Offensive AI is a paradigm that integrates AI-based technologies in cyber attacks. In this work, we explore whether AI can enhance the directory enumeration process and propose a novel Language Model-based framework. Our experiments -- conducted in a testbed consisting of 1 million URLs from different web application domains -- demonstrate the superiority of the LM-based attack, with an average performance increase of 969%.
arXiv Detail & Related papers (2024-04-22T12:40:38Z)
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts. Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website. We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z)
SoK: Analysis techniques for WebAssembly [0.0]
WebAssembly is a low-level bytecode language that allows languages like C, C++, and Rust to be executed in the browser at near-native performance. Vulnerabilities in memory-unsafe languages, like C and C++, can translate into vulnerabilities in WebAssembly binaries. WebAssembly has been used for malicious purposes like cryptojacking.
arXiv Detail & Related papers (2024-01-11T14:28:13Z)
Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence. We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers. Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z)
Leveraging AI to optimize website structure discovery during Penetration Testing [2.2049183478692584]
We propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence. We use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning. Results show a performance increase that is up to 50% for each of the conducted experiments.
arXiv Detail & Related papers (2021-01-18T18:21:42Z)
MixNet for Generalized Face Presentation Attack Detection [63.35297510471997]
We have proposed a deep learning-based network termed as textitMixNet to detect presentation attacks. The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category.
arXiv Detail & Related papers (2020-10-25T23:01:13Z)
Tasks Integrated Networks: Joint Detection and Retrieval for Image Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated. We first introduce an end-to-end Integrated Net (I-Net), which has three merits. We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z)
CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search [102.67142711824748]
CATCH is a novel Context-bAsed meTa reinforcement learning algorithm for transferrable arChitecture searcH. The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces. It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified.
arXiv Detail & Related papers (2020-07-18T09:35:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.