Related papers: Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)

Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)

URL: http://arxiv.org/abs/2502.07815v1
Date: Sun, 09 Feb 2025 07:24:16 GMT
Title: Decoding Complexity: Intelligent Pattern Exploration with CHPDA (Context Aware Hybrid Pattern Detection Algorithm)
Authors: Lokesh Koli, Shubham Kalra, Karanpreet Singh,
Abstract summary: This study evaluates pattern matching algorithms and exact-match search techniques to optimize detection speed, accuracy, and scalability. For exact matching, Aho-Corasick demonstrated superior performance (8 ms/MB) and scalability for large datasets. Despite its effectiveness, challenges remain, such as limited support and the need for regular pattern updates.
Score: 0.36868085124383626
License:
Abstract: Detecting sensitive data such as Personally Identifiable Information (PII) and Protected Health Information (PHI) is critical for data security platforms. This study evaluates regex-based pattern matching algorithms and exact-match search techniques to optimize detection speed, accuracy, and scalability. Our benchmarking results indicate that Google RE2 provides the best balance of speed (10-15 ms/MB), memory efficiency (8-16 MB), and accuracy (99.5%) among regex engines, outperforming PCRE while maintaining broader hardware compatibility than Hyperscan. For exact matching, Aho-Corasick demonstrated superior performance (8 ms/MB) and scalability for large datasets. Performance analysis revealed that regex processing time scales linearly with dataset size and pattern complexity. A hybrid AI + Regex approach achieved the highest F1 score (91. 6%) by improving recall and minimizing false positives. Device benchmarking confirmed that our solution maintains efficient CPU and memory usage on both high-performance and mid-range systems. Despite its effectiveness, challenges remain, such as limited multilingual support and the need for regular pattern updates. Future work should focus on expanding language coverage, integrating data security and privacy management (DSPM) with data loss prevention (DLP) tools, and enhancing regulatory compliance for broader global adoption.

Related papers

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z)
Hybrid Machine Learning Approach For Real-Time Malicious Url Detection Using Som-Rmo And Rbfn With Tabu Search Optimization [0.0]
The proliferation of malicious URLs has become a significant threat to internet security. Traditional detection methods struggle to keep pace with the evolving nature of these threats. We propose a hybrid machine learning approach combining efficient feature extraction with accurate classification.
arXiv Detail & Related papers (2024-07-05T07:24:49Z)
Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection [50.7263393517558]
We introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks.
arXiv Detail & Related papers (2024-03-23T15:49:13Z)
RIDE: Real-time Intrusion Detection via Explainable Machine Learning Implemented in a Memristor Hardware Architecture [24.824596231020585]
We propose a packet-level network intrusion detection solution that makes use of Recurrent Autoencoders to integrate an arbitrary-length sequence of packets into a more compact joint feature embedding. We show that our approach leads to an extremely efficient, real-time solution with high detection accuracy at the packet level.
arXiv Detail & Related papers (2023-11-27T17:30:19Z)
Improved Sparse Ising Optimization [0.0]
This report presents new data demonstrating significantly higher performance on some longstanding benchmark problems with up to 20,000 variables. Relative to leading reported combinations of speed and accuracy, a proof-of-concept implementation reached targets 2-4 orders of magnitude faster. The data suggest exciting possibilities for pushing the sparse Ising performance frontier to potentially strengthen algorithm portfolios, AI toolkits and decision-making systems.
arXiv Detail & Related papers (2023-11-15T17:59:06Z)
PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce. We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD. Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z)
Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection [17.761541379830373]
DeepDFA is a dataflow analysis-inspired graph learning framework. It was trained in 9 minutes, 75x faster than the highest-performing baseline model. It detected 8.7 out of 17 vulnerabilities on average across folds and was able to distinguish between patched and buggy versions.
arXiv Detail & Related papers (2022-12-15T19:49:27Z)
A Dependable Hybrid Machine Learning Model for Network Intrusion Detection [1.222622290392729]
We propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022.
arXiv Detail & Related papers (2022-12-08T20:19:27Z)
Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively. We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z)
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC) We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z)
Bayesian Optimization with Machine Learning Algorithms Towards Anomaly Detection [66.05992706105224]
In this paper, an effective anomaly detection framework is proposed utilizing Bayesian Optimization technique. The performance of the considered algorithms is evaluated using the ISCX 2012 dataset. Experimental results show the effectiveness of the proposed framework in term of accuracy rate, precision, low-false alarm rate, and recall.
arXiv Detail & Related papers (2020-08-05T19:29:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.