Locality-Sensitive Hashing for Efficient Web Application Security
Testing
- URL: http://arxiv.org/abs/2001.01128v1
- Date: Sat, 4 Jan 2020 21:05:15 GMT
- Title: Locality-Sensitive Hashing for Efficient Web Application Security
Testing
- Authors: Ilan Ben-Bassat and Erez Rokah
- Abstract summary: We present a novel approach to detect redundant content for security testing purposes.
The algorithm applies locality-sensitive hashing using MinHash sketches in order to analyze the Document Object Model (DOM) structure of web pages.
Our experimental results show that this approach allows a successful scan of RIAs that cannot be crawled otherwise.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Web application security has become a major concern in recent years, as more
and more content and services are available online. A useful method for
identifying security vulnerabilities is black-box testing, which relies on an
automated crawling of web applications. However, crawling Rich Internet
Applications (RIAs) is a very challenging task. One of the key obstacles
crawlers face is the state similarity problem: how to determine if two
client-side states are equivalent. As current methods do not completely solve
this problem, a successful scan of many real-world RIAs is still not possible.
We present a novel approach to detect redundant content for security testing
purposes. The algorithm applies locality-sensitive hashing using MinHash
sketches in order to analyze the Document Object Model (DOM) structure of web
pages, and to efficiently estimate similarity between them. Our experimental
results show that this approach allows a successful scan of RIAs that cannot be
crawled otherwise.
Related papers
- Beyond Browsing: API-Based Web Agents [58.39129004543844]
API-based agents outperform web browsing agents in experiments on WebArena.
Hybrid Agents out-perform both others nearly uniformly across tasks.
Results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.
arXiv Detail & Related papers (2024-10-21T19:46:06Z) - WebAssembly and Security: a review [0.8962460460173961]
We analyze 121 papers by identifying seven different security categories.
We aim to fill this gap by proposing a comprehensive review of research works dealing with security in WebAssembly.
arXiv Detail & Related papers (2024-07-17T03:37:28Z) - CRATOR: a Dark Web Crawler [1.7224362150588657]
This study proposes a general dark web crawler designed to extract pages handling security protocols, such as captchas.
Our approach uses a combination of seed URL lists, link analysis, and scanning to discover new content.
arXiv Detail & Related papers (2024-05-10T09:39:12Z) - Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models [16.89878267176532]
Offensive AI is a paradigm that integrates AI-based technologies in cyber attacks.
In this work, we explore whether AI can enhance the directory enumeration process and propose a novel Language Model-based framework.
Our experiments -- conducted in a testbed consisting of 1 million URLs from different web application domains -- demonstrate the superiority of the LM-based attack, with an average performance increase of 969%.
arXiv Detail & Related papers (2024-04-22T12:40:38Z) - AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts.
Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website.
We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z) - SoK: Analysis techniques for WebAssembly [0.0]
WebAssembly is a low-level bytecode language that allows languages like C, C++, and Rust to be executed in the browser at near-native performance.
Vulnerabilities in memory-unsafe languages, like C and C++, can translate into vulnerabilities in WebAssembly binaries.
WebAssembly has been used for malicious purposes like cryptojacking.
arXiv Detail & Related papers (2024-01-11T14:28:13Z) - Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence.
We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers.
Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z) - Leveraging AI to optimize website structure discovery during Penetration
Testing [2.2049183478692584]
We propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence.
We use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning.
Results show a performance increase that is up to 50% for each of the conducted experiments.
arXiv Detail & Related papers (2021-01-18T18:21:42Z) - MixNet for Generalized Face Presentation Attack Detection [63.35297510471997]
We have proposed a deep learning-based network termed as textitMixNet to detect presentation attacks.
The proposed algorithm utilizes state-of-the-art convolutional neural network architectures and learns the feature mapping for each attack category.
arXiv Detail & Related papers (2020-10-25T23:01:13Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - CATCH: Context-based Meta Reinforcement Learning for Transferrable
Architecture Search [102.67142711824748]
CATCH is a novel Context-bAsed meTa reinforcement learning algorithm for transferrable arChitecture searcH.
The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces.
It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified.
arXiv Detail & Related papers (2020-07-18T09:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.