Enhancing Cloud Security through Topic Modelling
- URL: http://arxiv.org/abs/2505.01463v2
- Date: Fri, 27 Jun 2025 04:34:30 GMT
- Title: Enhancing Cloud Security through Topic Modelling
- Authors: Sabbir M. Saleh, Nazim Madhavji, John Steinbacher,
- Abstract summary: This research explores the application of Natural Language Processing (NLP) techniques to analyse security-related text data and anticipate potential threats.<n>We focus on Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) to extract meaningful patterns from data sources.
- Score: 0.6117371161379209
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Protecting cloud applications is critical in an era where security threats are increasingly sophisticated and persistent. Continuous Integration and Continuous Deployment (CI/CD) pipelines are particularly vulnerable, making innovative security approaches essential. This research explores the application of Natural Language Processing (NLP) techniques, specifically Topic Modelling, to analyse security-related text data and anticipate potential threats. We focus on Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) to extract meaningful patterns from data sources, including logs, reports, and deployment traces. Using the Gensim framework in Python, these methods categorise log entries into security-relevant topics (e.g., phishing, encryption failures). The identified topics are leveraged to highlight patterns indicative of security issues across CI/CD's continuous stages (build, test, deploy). This approach introduces a semantic layer that supports early vulnerability recognition and contextual understanding of runtime behaviours.
Related papers
- Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security [63.41350337821108]
We propose Secure Tug-of-War (SecTOW) to enhance the security of multimodal large language models (MLLMs)<n>SecTOW consists of two modules: a defender and an auxiliary attacker, both trained iteratively using reinforcement learning (GRPO)<n>We show that SecTOW significantly improves security while preserving general performance.
arXiv Detail & Related papers (2025-07-29T17:39:48Z) - A Survey on Model Extraction Attacks and Defenses for Large Language Models [55.60375624503877]
Model extraction attacks pose significant security threats to deployed language models.<n>This survey provides a comprehensive taxonomy of extraction attacks and defenses, categorizing attacks into functionality extraction, training data extraction, and prompt-targeted attacks.<n>We examine defense mechanisms organized into model protection, data privacy protection, and prompt-targeted strategies, evaluating their effectiveness across different deployment scenarios.
arXiv Detail & Related papers (2025-06-26T22:02:01Z) - Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation [70.62656296780074]
We propose AIDSAFE: Agentic Iterative Deliberation for Safety Reasoning, a novel data generation recipe.<n>A data refiner stage in AIDSAFE ensures high-quality outputs by eliminating repetitive, redundant, and deceptive thoughts.<n>Our evaluations demonstrate that AIDSAFE-generated CoTs achieve superior policy adherence and reasoning quality.
arXiv Detail & Related papers (2025-05-27T21:34:40Z) - Advancing Neural Network Verification through Hierarchical Safety Abstract Interpretation [52.626086874715284]
We introduce a novel problem formulation called Abstract DNN-Verification, which verifies a hierarchical structure of unsafe outputs.<n>By leveraging abstract interpretation and reasoning about output reachable sets, our approach enables assessing multiple safety levels during the formal verification process.<n>Our contributions include a theoretical exploration of the relationship between our novel abstract safety formulation and existing approaches.
arXiv Detail & Related papers (2025-05-08T13:29:46Z) - From Data Behavior to Code Analysis: A Multimodal Study on Security and Privacy Challenges in Blockchain-Based DApp [1.6081378516701994]
The recent proliferation of blockchain-based decentralized applications (DApp) has catalyzed transformative advancements in distributed systems.<n>This study initiates with a systematic analysis of behavioral patterns derived from empirical DApp datasets.<n>The principal security vulnerabilities in vulnerability-based smart contracts developed via Solidity are then critically examined.
arXiv Detail & Related papers (2025-04-16T08:30:43Z) - Temporal Context Awareness: A Defense Framework Against Multi-turn Manipulation Attacks on Large Language Models [0.0]
Large Language Models (LLMs) are increasingly vulnerable to sophisticated multi-turn manipulation attacks.<n>This paper introduces the Temporal Context Awareness framework, a novel defense mechanism designed to address this challenge.<n>Preliminary evaluations on simulated adversarial scenarios demonstrate the framework's potential to identify subtle manipulation patterns.
arXiv Detail & Related papers (2025-03-18T22:30:17Z) - Computational Safety for Generative AI: A Signal Processing Perspective [65.268245109828]
computational safety is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI.<n>We show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts.<n>We discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety.
arXiv Detail & Related papers (2025-02-18T02:26:50Z) - Safety at Scale: A Comprehensive Survey of Large Model Safety [298.05093528230753]
We present a comprehensive taxonomy of safety threats to large models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats.<n>We identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices.
arXiv Detail & Related papers (2025-02-02T05:14:22Z) - Beyond the Surface: An NLP-based Methodology to Automatically Estimate CVE Relevance for CAPEC Attack Patterns [42.63501759921809]
We propose a methodology leveraging Natural Language Processing (NLP) to associate Common Vulnerabilities and Exposure (CAPEC) vulnerabilities with Common Attack Patternion and Classification (CAPEC) attack patterns.<n> Experimental evaluations demonstrate superior performance compared to state-of-the-art models.
arXiv Detail & Related papers (2025-01-13T08:39:52Z) - Model Inversion Attacks: A Survey of Approaches and Countermeasures [59.986922963781]
Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training.
Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs.
This survey aims to summarize up-to-date MIA methods in both attacks and defenses.
arXiv Detail & Related papers (2024-11-15T08:09:28Z) - Advancing Software Security and Reliability in Cloud Platforms through AI-based Anomaly Detection [0.5599792629509228]
This research aims to enhance CI/CD pipeline security by implementing anomaly detection through AI support.
The goal is to identify unusual behaviour or variations from network traffic patterns in pipeline and cloud platforms.
We implemented a combination of Convolution Neural Network(CNN) and Long Short-Term Memory (LSTM) to detect unusual traffic patterns.
arXiv Detail & Related papers (2024-11-14T05:45:55Z) - Models Are Codes: Towards Measuring Malicious Code Poisoning Attacks on Pre-trained Model Hubs [10.252989233081395]
This paper presents the first systematic study of malicious code poisoning attacks on pre-trained model hubs, focusing on the Hugging Face platform.
We propose MalHug, an end-to-end pipeline tailored for Hugging Face that combines dataset loading script extraction, model deserialization, and taint pattern matching.
MalHug has monitored more than 705K models and 176K datasets, uncovering 91 malicious models and 9 malicious dataset loading scripts.
arXiv Detail & Related papers (2024-09-14T08:47:22Z) - SETC: A Vulnerability Telemetry Collection Framework [0.0]
This paper introduces the Security Exploit Telemetry Collection (SETC) framework.
SETC generates reproducible vulnerability exploit data at scale for robust defensive security research.
This research enables scalable exploit data generation to drive innovations in threat modeling, detection methods, analysis techniques, and strategies.
arXiv Detail & Related papers (2024-06-10T00:13:35Z) - A Survey of Unikernel Security: Insights and Trends from a Quantitative Analysis [0.0]
This research presents a quantitative methodology using TF-IDF to analyze the focus of security discussions within unikernel research literature.
Memory Protection Extensions and Data Execution Prevention were the least frequently occurring topics, while SGX was the most frequent topic.
arXiv Detail & Related papers (2024-06-04T00:51:12Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - A Review of Topological Data Analysis for Cybersecurity [1.0878040851638]
Topological Data Analysis (TDA) studies the high level structure of data using techniques from algebraic topology.
We hope to highlight to researchers a promising new area with strong potential to improve cybersecurity data science.
arXiv Detail & Related papers (2022-02-16T13:03:52Z) - OntoEnricher: A Deep Learning Approach for Ontology Enrichment from
Unstructured Text [2.707154152696381]
Existing information on vulnerabilities, controls, and advisories available on the web provides an opportunity to represent knowledge and perform analytics to mitigate some of the concerns.
This necessitates dynamic and automated enrichment of information security.
Existing ontology enrichment algorithms based on natural processing and ML models have issues with the contextual extraction of concepts in words, phrases and sentences.
arXiv Detail & Related papers (2021-02-08T09:43:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.