Related papers: Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools?

Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools?

URL: http://arxiv.org/abs/2601.00559v1
Date: Fri, 02 Jan 2026 04:17:36 GMT
Title: Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools?
Authors: Jason Quantrill, Noura Khajehnouri, Zihan Guo, Manar H. Alalfi,
Abstract summary: This work presents the first comprehensive evaluation of Large Language Models (LLMs) across a multi-category interaction threat taxonomy.<n>We benchmark Llama 3.1 8B, Llama 70B, GPT-4o, Gemini-2.5-Pro, and DeepSeek-R1 across zero-, one-, and two-shot settings.<n>Our findings show that while LLMs exhibit promising semantic understanding, their accuracy degrades significantly for threats requiring cross-rule structural reasoning.
Score: 1.8549313085249322
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Smart home IoT platforms such as openHAB rely on Trigger Action Condition (TAC) rules to automate device behavior, but the interplay among these rules can give rise to interaction threats, unintended or unsafe behaviors emerging from implicit dependencies, conflicting triggers, or overlapping conditions. Identifying these threats requires semantic understanding and structural reasoning that traditionally depend on symbolic, constraint-driven static analysis. This work presents the first comprehensive evaluation of Large Language Models (LLMs) across a multi-category interaction threat taxonomy, assessing their performance on both the original openHAB (oHC/IoTB) dataset and a structurally challenging Mutation dataset designed to test robustness under rule transformations. We benchmark Llama 3.1 8B, Llama 70B, GPT-4o, Gemini-2.5-Pro, and DeepSeek-R1 across zero-, one-, and two-shot settings, comparing their results against oHIT's manually validated ground truth. Our findings show that while LLMs exhibit promising semantic understanding, particularly on action- and condition-related threats, their accuracy degrades significantly for threats requiring cross-rule structural reasoning, especially under mutated rule forms. Model performance varies widely across threat categories and prompt settings, with no model providing consistent reliability. In contrast, the symbolic reasoning baseline maintains stable detection across both datasets, unaffected by rule rewrites or structural perturbations. These results underscore that LLMs alone are not yet dependable for safety critical interaction-threat detection in IoT environments. We discuss the implications for tool design and highlight the potential of hybrid architectures that combine symbolic analysis with LLM-based semantic interpretation to reduce false positives while maintaining structural rigor.

Related papers

Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [50.075587392477935]
We conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems.<n>Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack.
arXiv Detail & Related papers (2026-01-20T06:42:56Z)
CoG: Controllable Graph Reasoning via Relational Blueprints and Failure-Aware Refinement over Knowledge Graphs [53.199517625701475]
CoG is a training-free framework inspired by Dual-Process Theory that mimics the interplay between intuition and deliberation.<n>CoG significantly outperforms state-of-the-art approaches in both accuracy and efficiency.
arXiv Detail & Related papers (2026-01-16T07:27:40Z)
Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks [6.573058520271728]
We identify a connection between interpretability and robustness that can be directly leveraged during training.<n>We introduce an attribution-guided refinement framework that transforms Local Interpretable Model-Agnostic Explanations into an active training signal.
arXiv Detail & Related papers (2026-01-02T19:36:03Z)
Energy-Efficient Multi-LLM Reasoning for Binary-Free Zero-Day Detection in IoT Firmware [5.485965161578769]
Existing analysis methods, such as static analysis, symbolic execution, and fuzzing, depend on binary visibility and functional emulation.<n>We propose a binary-free, architecture-agnostic solution that estimates the likelihood of conceptual zero-day vulnerabilities using only high-level descriptors.
arXiv Detail & Related papers (2025-12-23T00:34:50Z)
MEEA: Mere Exposure Effect-Driven Confrontational Optimization for LLM Jailbreaking [10.331506725187038]
We propose MEEA, a fully automated framework for evaluating multi-turn safety robustness.<n>MEEA builds semantically progressive prompt chains and optimize them using a simulated annealing strategy.<n>Our results show that MEEA consistently achieves higher attack success rates than seven representative baselines.
arXiv Detail & Related papers (2025-12-21T14:43:26Z)
Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs [38.3239023969819]
Large Language Models (LLMs) have emerged as powerful tools for diverse applications.<n>We identify and propose a novel class of vulnerabilities, termed Tool-Completion Attack (TCA)<n>We introduce Context-Aware Hierarchical Learning (CAHL) to address these vulnerabilities.
arXiv Detail & Related papers (2025-12-03T12:10:21Z)
Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness [9.013874391203453]
Adversarial examples reveal critical vulnerabilities in deep neural networks by exploiting their sensitivity to imperceptible input perturbations.<n>In this work, we investigate an architectural approach to adversarial robustness by embedding group-equivariant convolutions.<n>These layers encode symmetry priors that align model behavior with structured transformations in the input space, promoting smoother decision boundaries.
arXiv Detail & Related papers (2025-10-17T19:26:58Z)
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models [55.30555646945055]
Text-to-Image (T2I) models are vulnerable to semantic leakage.<n>We introduce DeLeaker, a lightweight approach that mitigates leakage by directly intervening on the model's attention maps.<n>SLIM is the first dataset dedicated to semantic leakage.
arXiv Detail & Related papers (2025-10-16T17:39:21Z)
Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization [58.390885294401066]
Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>We propose AlignRAG, a novel iterative framework grounded in Critique-Driven Alignment (CDA)<n>We introduce AlignRAG-auto, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations.
arXiv Detail & Related papers (2025-04-21T04:56:47Z)
Benchmarking the Spatial Robustness of DNNs via Natural and Adversarial Localized Corruptions [49.546479320670464]
This paper introduces specialized metrics for benchmarking the spatial robustness of segmentation models.<n>We propose region-aware multi-attack adversarial analysis, a method that enables a deeper understanding of model robustness.<n>The results reveal that models respond to these two types of threats differently.
arXiv Detail & Related papers (2025-04-02T11:37:39Z)
Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)<n>To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.