Related papers: Energy-Efficient Multi-LLM Reasoning for Binary-Free Zero-Day Detection in IoT Firmware

Energy-Efficient Multi-LLM Reasoning for Binary-Free Zero-Day Detection in IoT Firmware

URL: http://arxiv.org/abs/2512.19945v1
Date: Tue, 23 Dec 2025 00:34:50 GMT
Title: Energy-Efficient Multi-LLM Reasoning for Binary-Free Zero-Day Detection in IoT Firmware
Authors: Saeid Jamshidi, Omar Abdul-Wahab, Martine Bellaïche, Foutse Khomh,
Abstract summary: Existing analysis methods, such as static analysis, symbolic execution, and fuzzing, depend on binary visibility and functional emulation.<n>We propose a binary-free, architecture-agnostic solution that estimates the likelihood of conceptual zero-day vulnerabilities using only high-level descriptors.
Score: 5.485965161578769
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Securing Internet of Things (IoT) firmware remains difficult due to proprietary binaries, stripped symbols, heterogeneous architectures, and limited access to executable code. Existing analysis methods, such as static analysis, symbolic execution, and fuzzing, depend on binary visibility and functional emulation, making them unreliable when firmware is encrypted or inaccessible. To address this limitation, we propose a binary-free, architecture-agnostic solution that estimates the likelihood of conceptual zero-day vulnerabilities using only high-level descriptors. The approach integrates a tri-LLM reasoning architecture combining a LLaMA-based configuration interpreter, a DeepSeek-based structural abstraction analyzer, and a GPT-4o semantic fusion model. The solution also incorporates LLM computational signatures, including latency patterns, uncertainty markers, and reasoning depth indicators, as well as an energy-aware symbolic load model, to enhance interpretability and operational feasibility. In addition, we formally derive the mathematical foundations of the reasoning pipeline, establishing monotonicity, divergence, and energy-risk coupling properties that theoretically justify the model's behavior. Simulation-based evaluation reveals that high exposure conditions increase the predicted zero-day likelihood by 20 to 35 percent across models, with GPT-4o demonstrating the strongest cross-layer correlations and the highest sensitivity. Energy and divergence metrics significantly predict elevated risk (p < 0.01), reinforcing the effectiveness of the proposed reasoning framework.

Related papers

TorchLean: Formalizing Neural Networks in Lean [71.68907600404513]
We introduce TorchLean, a framework that treats learned models as first-class mathematical objects with a single, precise semantics shared by execution and verification.<n>We validate TorchLean end-to-end on certified robustness, physics-informed residual bounds for PINNs, and Lyapunov-style neural controller verification.
arXiv Detail & Related papers (2026-02-26T05:11:44Z)
PhyNiKCE: A Neurosymbolic Agentic Framework for Autonomous Computational Fluid Dynamics [0.0]
This work introduces PhyNiKCE, a neurosymbolic agentic framework for trustworthy engineering.<n>Unlike standard black-box agents, PhyNiKCE decouples neural planning from symbolic validation.<n>This architecture offers a scalable, auditable paradigm for Trustworthy Artificial Intelligence in broader industrial automation.
arXiv Detail & Related papers (2026-02-12T07:37:56Z)
RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis [53.90240071275054]
The transition toward localized intelligence through Small Language Models (SLMs) has intensified the need for rigorous performance characterization on resource-constrained edge hardware.<n>We propose a systematic framework that unifies architectural primitives and hardware constraints through the lens of operational intensity (OI)<n>By defining an inference-potential region, we introduce the Relative Inference Potential as a novel metric to compare efficiency differences between Large Language Models (LLMs) on the same hardware substrate.
arXiv Detail & Related papers (2026-02-12T03:02:22Z)
CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis [0.0]
We introduce CircuChain, a diagnostic benchmark designed to disentangle instruction compliance from physical reasoning competence in electrical circuit analysis.<n>A multi-stage verification pipeline, combining symbolic solvers, SPICE simulation, and an LLM-based error taxonomy, enables fine-grained attribution of failures to convention errors.<n>The strongest model evaluated exhibits near-perfect physical reasoning but a high rate of convention violations when Trap conditions deliberately invert natural sign patterns.
arXiv Detail & Related papers (2026-01-29T06:13:44Z)
Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs [50.075587392477935]
We conduct the first large-scale empirical study of 705 real-world failures from the open-source DeepSeek, Llama, and Qwen ecosystems.<n>Our analysis reveals a paradigm shift: white-box orchestration relocates the reliability bottleneck from model algorithmic defects to the systemic fragility of the deployment stack.
arXiv Detail & Related papers (2026-01-20T06:42:56Z)
Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching [0.42970700836450487]
This paper builds on work that introduced a four-metric Total Injection Vulnerability Score (TIVS)<n>It investigates how defence effectiveness interacts with transparency in a HOPE-inspired Nested Learning architecture.<n> Experiments show that the system achieves secure responses with zero high-risk breaches.
arXiv Detail & Related papers (2026-01-19T16:10:11Z)
AmbShield: Enhancing Physical Layer Security with Ambient Backscatter Devices against Eavesdroppers [69.56534335936534]
AmbShield is an AmBD-assisted PLS scheme that leverages naturally distributed AmBDs to simultaneously strengthen the legitimate channel and degrade eavesdroppers'<n>In AmbShield, AmBDs are exploited as friendly jammers that randomly backscatter to create interference at eavesdroppers, and as passive relays that backscatter the desired signal to enhance the capacity of legitimate devices.
arXiv Detail & Related papers (2026-01-14T20:56:50Z)
Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools? [1.8549313085249322]
This work presents the first comprehensive evaluation of Large Language Models (LLMs) across a multi-category interaction threat taxonomy.<n>We benchmark Llama 3.1 8B, Llama 70B, GPT-4o, Gemini-2.5-Pro, and DeepSeek-R1 across zero-, one-, and two-shot settings.<n>Our findings show that while LLMs exhibit promising semantic understanding, their accuracy degrades significantly for threats requiring cross-rule structural reasoning.
arXiv Detail & Related papers (2026-01-02T04:17:36Z)
Interpretable Hybrid Deep Q-Learning Framework for IoT-Based Food Spoilage Prediction with Synthetic Data Generation and Hardware Validation [0.5417521241272645]
The need for an intelligent, real-time spoilage prediction system has become critical in modern IoT-driven food supply chains.<n>We propose a hybrid reinforcement learning framework integrating Long Short-Term Memory (LSTM) and Recurrent Neural Networks (RNN) for enhanced spoilage prediction.
arXiv Detail & Related papers (2025-12-22T12:59:48Z)
GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction [51.83437071408662]
We propose GLOW, a unified framework for AW performance prediction.<n>GLOW combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs.<n>Experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.
arXiv Detail & Related papers (2025-12-11T13:30:46Z)
Hierarchical Evaluation of Software Design Capabilities of Large Language Models of Code [7.897548449569687]
Large language models (LLMs) are increasingly adopted in software engineering domain, yet robustness of their grasp on core design concepts remains unclear.<n>We generate poorly designed software fragments under various levels of guidance.<n> Reasoning about coupling proves brittle; performance collapses in noisy, open-ended scenarios.<n> Reasoning-trace analysis confirms these failure modes, revealing textitcognitive shortcutting for coupling versus a more exhaustive (yet still failing) analysis for cohesion.
arXiv Detail & Related papers (2025-11-25T23:50:00Z)
LTD-Bench: Evaluating Large Language Models by Letting Them Draw [57.237152905238084]
LTD-Bench is a breakthrough benchmark for large language models (LLMs)<n>It transforms LLM evaluation from abstract scores to directly observable visual outputs by requiring models to generate drawings through dot matrices or executable code.<n> LTD-Bench's visual outputs enable powerful diagnostic analysis, offering a potential approach to investigate model similarity.
arXiv Detail & Related papers (2025-11-04T08:11:23Z)
SIM-CoT: Supervised Implicit Chain-of-Thought [108.30049193668083]
Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models.<n>We identify a core latent instability issue when scaling the computational budget of implicit CoT.<n>We propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space.
arXiv Detail & Related papers (2025-09-24T17:01:32Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [86.76714527437383]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
LENS-XAI: Redefining Lightweight and Explainable Network Security through Knowledge Distillation and Variational Autoencoders for Scalable Intrusion Detection in Cybersecurity [0.0]
This study introduces the Lightweight Explainable Network Security framework (LENS-XAI)<n>LENS-XAI combines robust intrusion detection with enhanced interpretability and scalability.<n>This research contributes significantly to advancing IDS by addressing computational efficiency, feature interpretability, and real-world applicability.
arXiv Detail & Related papers (2025-01-01T10:00:49Z)
Secure Instruction and Data-Level Information Flow Tracking Model for RISC-V [0.0]
Unauthorized access, fault injection, and privacy invasion are potential threats from untrusted actors. We propose an integrated Information Flow Tracking (IFT) technique to enable runtime security to protect system integrity. This study proposes a multi-level IFT model that integrates a hardware-based IFT technique with a gate-level-based IFT (GLIFT) technique.
arXiv Detail & Related papers (2023-11-17T02:04:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.