Leveraging AI to optimize website structure discovery during Penetration
Testing
- URL: http://arxiv.org/abs/2101.07223v1
- Date: Mon, 18 Jan 2021 18:21:42 GMT
- Title: Leveraging AI to optimize website structure discovery during Penetration
Testing
- Authors: Diego Antonelli, Roberta Cascella, Gaetano Perrone, Simon Pietro
Romano, Antonio Schiano
- Abstract summary: We propose an advanced technique to optimize the dirbusting process by leveraging Artificial Intelligence.
We use semantic clustering techniques in order to organize wordlist items in different groups according to their semantic meaning.
Results show a performance increase that is up to 50% for each of the conducted experiments.
- Score: 2.2049183478692584
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Dirbusting is a technique used to brute force directories and file names on
web servers while monitoring HTTP responses, in order to enumerate server
contents. Such a technique uses lists of common words to discover the hidden
structure of the target website. Dirbusting typically relies on response codes
as discovery conditions to find new pages. It is widely used in web application
penetration testing, an activity that allows companies to detect websites
vulnerabilities. Dirbusting techniques are both time and resource consuming and
innovative approaches have never been explored in this field. We hence propose
an advanced technique to optimize the dirbusting process by leveraging
Artificial Intelligence. More specifically, we use semantic clustering
techniques in order to organize wordlist items in different groups according to
their semantic meaning. The created clusters are used in an ad-hoc implemented
next-word intelligent strategy. This paper demonstrates that the usage of
clustering techniques outperforms the commonly used brute force methods.
Performance is evaluated by testing eight different web applications. Results
show a performance increase that is up to 50% for each of the conducted
experiments.
Related papers
- Improving Retrieval in Sponsored Search by Leveraging Query Context Signals [6.152499434499752]
We propose an approach to enhance query understanding by augmenting queries with rich contextual signals.
We use web search titles and snippets to ground queries in real-world information and utilize GPT-4 to generate query rewrites and explanations.
Our context-aware approach substantially outperforms context-free models.
arXiv Detail & Related papers (2024-07-19T14:28:53Z) - Offensive AI: Enhancing Directory Brute-forcing Attack with the Use of Language Models [16.89878267176532]
Offensive AI is a paradigm that integrates AI-based technologies in cyber attacks.
In this work, we explore whether AI can enhance the directory enumeration process and propose a novel Language Model-based framework.
Our experiments -- conducted in a testbed consisting of 1 million URLs from different web application domains -- demonstrate the superiority of the LM-based attack, with an average performance increase of 969%.
arXiv Detail & Related papers (2024-04-22T12:40:38Z) - AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts.
Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website.
We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - Unified Functional Hashing in Automatic Machine Learning [58.77232199682271]
We show that large efficiency gains can be obtained by employing a fast unified functional hash.
Our hash is "functional" in that it identifies equivalent candidates even if they were represented or coded differently.
We show dramatic improvements on multiple AutoML domains, including neural architecture search and algorithm discovery.
arXiv Detail & Related papers (2023-02-10T18:50:37Z) - Effective and Efficient Query-aware Snippet Extraction for Web Search [61.60405035952961]
We propose an effective query-aware webpage snippet extraction method named DeepQSE.
DeepQSE first learns query-aware sentence representations for each sentence to capture the fine-grained relevance between query and sentence.
We propose an efficient version of DeepQSE, named Efficient-DeepQSE, which can significantly improve the inference speed of DeepQSE without affecting its performance.
arXiv Detail & Related papers (2022-10-17T07:46:17Z) - Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval [25.402767809863946]
Inverted file structure is a common technique for accelerating dense retrieval.
In this work, we present the Hybrid Inverted Index (HI$2$), where the embedding clusters and salient terms work to accelerate dense retrieval.
arXiv Detail & Related papers (2022-10-11T15:12:41Z) - Unsupervised Key-phrase Extraction and Clustering for Classification
Scheme in Scientific Publications [0.0]
We investigate possible ways of automating parts of the Systematic Mapping (SM) and Systematic Review (SR) process.
Key-phrases are extracted from scientific documents using unsupervised methods, which are then used to construct the corresponding Classification Scheme.
We also explore how clustering can be used to group related key-phrases.
arXiv Detail & Related papers (2021-01-25T10:17:33Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z) - CoNCRA: A Convolutional Neural Network Code Retrieval Approach [0.0]
We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval.
Our technique aims to find the code snippet that most closely matches the developer's intent, expressed in natural language.
We evaluated our approach's efficacy on a dataset composed of questions and code snippets collected from Stack Overflow.
arXiv Detail & Related papers (2020-09-03T23:38:52Z) - CATCH: Context-based Meta Reinforcement Learning for Transferrable
Architecture Search [102.67142711824748]
CATCH is a novel Context-bAsed meTa reinforcement learning algorithm for transferrable arChitecture searcH.
The combination of meta-learning and RL allows CATCH to efficiently adapt to new tasks while being agnostic to search spaces.
It is also capable of handling cross-domain architecture search as competitive networks on ImageNet, COCO, and Cityscapes are identified.
arXiv Detail & Related papers (2020-07-18T09:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.