Unlocking User-oriented Pages: Intention-driven Black-box Scanner for Real-world Web Applications
- URL: http://arxiv.org/abs/2504.20801v2
- Date: Wed, 30 Apr 2025 08:23:01 GMT
- Title: Unlocking User-oriented Pages: Intention-driven Black-box Scanner for Real-world Web Applications
- Authors: Weizhe Wang, Yao Zhang, Kaitai Liang, Guangquan Xu, Hongpeng Bai, Qingyang Yan, Xi Zheng, Bin Wu,
- Abstract summary: Hoyen is a black-box scanner that uses the Large Language Model to predict user intention.<n> Hoyen has been rigorously evaluated on 12 popular open-source web applications and compared with 6 representative tools.
- Score: 16.223807733708767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Black-box scanners have played a significant role in detecting vulnerabilities for web applications. A key focus in current black-box scanning is increasing test coverage (i.e., accessing more web pages). However, since many web applications are user-oriented, some deep pages can only be accessed through complex user interactions, which are difficult to reach by existing black-box scanners. To fill this gap, a key insight is that web pages contain a wealth of semantic information that can aid in understanding potential user intention. Based on this insight, we propose Hoyen, a black-box scanner that uses the Large Language Model to predict user intention and provide guidance for expanding the scanning scope. Hoyen has been rigorously evaluated on 12 popular open-source web applications and compared with 6 representative tools. The results demonstrate that Hoyen performs a comprehensive exploration of web applications, expanding the attack surface while achieving about 2x than the coverage of other scanners on average, with high request accuracy. Furthermore, Hoyen detected over 90% of its requests towards the core functionality of the application, detecting more vulnerabilities than other scanners, including unique vulnerabilities in well-known web applications. Our data/code is available at https://hoyen.tjunsl.com/
Related papers
- Multi-Record Web Page Information Extraction From News Websites [83.88591755871734]
In this paper, we focus on the problem of extracting information from web pages containing many records.
To address this gap, we created a large-scale, open-access dataset specifically designed for list pages.
Our dataset contains 13,120 web pages with news lists, significantly exceeding existing datasets in both scale and complexity.
arXiv Detail & Related papers (2025-02-20T15:05:00Z) - Beyond Browsing: API-Based Web Agents [58.39129004543844]
API-based agents outperform web browsing agents in experiments on WebArena.
Hybrid Agents out-perform both others nearly uniformly across tasks.
Results strongly suggest that when APIs are available, they present an attractive alternative to relying on web browsing alone.
arXiv Detail & Related papers (2024-10-21T19:46:06Z) - How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users [50.699390248359265]
Browser fingerprinting can be used to identify and track users across the Web, even without cookies.
This technique and resulting privacy risks have been studied for over a decade.
We provide a first-of-its-kind dataset to enable further research.
arXiv Detail & Related papers (2024-10-09T14:51:58Z) - Unveiling the Digital Fingerprints: Analysis of Internet attacks based on website fingerprints [0.0]
We show that using the newest machine learning algorithms an attacker can deanonymize Tor traffic by applying such techniques.
We capture network packets across 11 days, while users navigate specific web pages, recording data in.pcapng format through the Wireshark network capture tool.
arXiv Detail & Related papers (2024-09-01T18:44:40Z) - Fuzzing Frameworks for Server-side Web Applications: A Survey [3.522950356329991]
This study reviews the state-of-the-art fuzzing frameworks for testing web applications through web API.
We collect papers from seven online repositories of peer-reviewed articles over the last ten years.
arXiv Detail & Related papers (2024-06-05T12:45:02Z) - When the Few Outweigh the Many: Illicit Content Recognition with
Few-Shot Learning [0.0]
This paper investigates an alternative technique for recognizing illegal activities from images.
Siamese neural networks reach 90.9% on 20-Shot experiments over a 10-class dataset.
arXiv Detail & Related papers (2023-11-28T18:28:03Z) - User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search.
We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results.
More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z) - Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence.
We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers.
Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z) - Reproducing Random Forest Efficacy in Detecting Port Scanning [0.0]
Port scanning is a method used by hackers to identify vulnerabilities in a network or system.
It is important to detect port scanning because it is often the first step in a cyber attack.
Researchers have worked for over a decade to develop robust methods to detect port scanning.
arXiv Detail & Related papers (2023-02-18T12:28:53Z) - Trainable Structure Tensors for Autonomous Baggage Threat Detection
Under Extreme Occlusion [45.39173572825739]
This paper presents a novel instance segmentation framework that utilizes trainable structure tensors to highlight the contours of the occluded and cluttered contraband items.
It is the only framework that has been validated on combined grayscale and colored scans obtained from four different types of X-ray scanners.
arXiv Detail & Related papers (2020-09-28T09:12:10Z) - Adversarial EXEmples: A Survey and Experimental Evaluation of Practical
Attacks on Machine Learning for Windows Malware Detection [67.53296659361598]
adversarial EXEmples can bypass machine learning-based detection by perturbing relatively few input bytes.
We develop a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks.
These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section.
arXiv Detail & Related papers (2020-08-17T07:16:57Z) - Locality-Sensitive Hashing for Efficient Web Application Security
Testing [0.0]
We present a novel approach to detect redundant content for security testing purposes.
The algorithm applies locality-sensitive hashing using MinHash sketches in order to analyze the Document Object Model (DOM) structure of web pages.
Our experimental results show that this approach allows a successful scan of RIAs that cannot be crawled otherwise.
arXiv Detail & Related papers (2020-01-04T21:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.