Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models
- URL: http://arxiv.org/abs/2511.02894v2
- Date: Mon, 10 Nov 2025 00:01:56 GMT
- Title: Adaptive and Robust Data Poisoning Detection and Sanitization in Wearable IoT Systems using Large Language Models
- Authors: W. K. M Mithsara, Ning Yang, Ahmed Imteaj, Hussein Zangoti, Abdur R. Shahid,
- Abstract summary: This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems.<n>Our approach incorporates textitrole play prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies.<n>We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost.
- Score: 4.285609194445095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The widespread integration of wearable sensing devices in Internet of Things (IoT) ecosystems, particularly in healthcare, smart homes, and industrial applications, has required robust human activity recognition (HAR) techniques to improve functionality and user experience. Although machine learning models have advanced HAR, they are increasingly susceptible to data poisoning attacks that compromise the data integrity and reliability of these systems. Conventional approaches to defending against such attacks often require extensive task-specific training with large, labeled datasets, which limits adaptability in dynamic IoT environments. This work proposes a novel framework that uses large language models (LLMs) to perform poisoning detection and sanitization in HAR systems, utilizing zero-shot, one-shot, and few-shot learning paradigms. Our approach incorporates \textit{role play} prompting, whereby the LLM assumes the role of expert to contextualize and evaluate sensor anomalies, and \textit{think step-by-step} reasoning, guiding the LLM to infer poisoning indicators in the raw sensor data and plausible clean alternatives. These strategies minimize reliance on curation of extensive datasets and enable robust, adaptable defense mechanisms in real-time. We perform an extensive evaluation of the framework, quantifying detection accuracy, sanitization quality, latency, and communication cost, thus demonstrating the practicality and effectiveness of LLMs in improving the security and reliability of wearable IoT systems.
Related papers
- Benchmarking Machine Learning Models for IoT Malware Detection under Data Scarcity and Drift [0.5735035463793007]
Internet of Things (IoT) devices are prime targets for cyberattacks and malware applications.<n>Machine learning (ML) offers a promising approach to automated malware detection and classification.<n>This study investigates the effectiveness of four supervised learning models for malware detection and classification.
arXiv Detail & Related papers (2026-01-26T17:59:33Z) - Multi-Agent Collaborative Intrusion Detection for Low-Altitude Economy IoT: An LLM-Enhanced Agentic AI Framework [60.72591149679355]
The rapid expansion of low-altitude economy Internet of Things (LAE-IoT) networks has created unprecedented security challenges.<n>Traditional intrusion detection systems fail to tackle the unique characteristics of aerial IoT environments.<n>We introduce a large language model (LLM)-enabled agentic AI framework for enhancing intrusion detection in LAE-IoT networks.
arXiv Detail & Related papers (2026-01-25T12:47:25Z) - MIRAGE: Misleading Retrieval-Augmented Generation via Black-box and Query-agnostic Poisoning Attacks [47.46936341268548]
Retrieval-Augmented Generation (RAG) systems introduce a critical attack surface: corpus poisoning.<n>We propose MIRAGE, a novel multi-stage poisoning pipeline designed for strict black-box and query-agnostic environments.<n>Extensive experiments demonstrate that MIRAGE significantly outperforms existing baselines in both attack efficacy and stealthiness.
arXiv Detail & Related papers (2025-12-09T06:38:16Z) - From Physics to Machine Learning and Back: Part II - Learning and Observational Bias in PHM [52.64097278841485]
Review examines how incorporating learning and observational biases through physics-informed modeling and data strategies can guide models toward physically consistent and reliable predictions.<n>Fast adaptation methods including meta-learning and few-shot learning are reviewed alongside domain generalization techniques.
arXiv Detail & Related papers (2025-09-25T14:15:43Z) - Securing Radiation Detection Systems with an Efficient TinyML-Based IDS for Edge Devices [3.5216201054915692]
Radiation Detection Systems (RDSs) play a vital role in ensuring public safety across various settings.<n>These systems are increasingly vulnerable to cyber-attacks.<n>This paper presents a new synthetic radiation dataset and an Intrusion Detection System (IDS) tailored for resource-constrained environments.
arXiv Detail & Related papers (2025-09-01T16:26:37Z) - Contrastive-KAN: A Semi-Supervised Intrusion Detection Framework for Cybersecurity with scarce Labeled Data [0.0]
We propose a real-time intrusion detection system based on a semi-supervised contrastive learning framework.<n>Our method leverages abundant unlabeled data to effectively distinguish between normal and attack behaviors.<n> Experimental results show that our method outperforms existing contrastive learning-based approaches.
arXiv Detail & Related papers (2025-07-14T21:02:34Z) - Expert-in-the-Loop Systems with Cross-Domain and In-Domain Few-Shot Learning for Software Vulnerability Detection [38.083049237330826]
This study explores the use of Large Language Models (LLMs) in software vulnerability assessment by simulating the identification of Python code with known Common Weaknessions (CWEs)<n>Our results indicate that while zero-shot prompting performs poorly, few-shot prompting significantly enhances classification performance.<n> challenges such as model reliability, interpretability, and adversarial robustness remain critical areas for future research.
arXiv Detail & Related papers (2025-06-11T18:43:51Z) - LSM-2: Learning from Incomplete Wearable Sensor Data [65.58595667477505]
This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM)<n>AIM learns robust representations directly from incomplete data without requiring explicit imputation.<n>Our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling.
arXiv Detail & Related papers (2025-06-05T17:57:11Z) - LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems [6.649910168731417]
This paper presents a novel Large Language Model (LLM)-based framework for comprehensive threat detection and prevention in IoT environments.<n>The system integrates lightweight LLMs fine-tuned on IoT-specific datasets for real-time anomaly detection and automated, context-aware mitigation strategies.<n> Experimental results in simulated IoT environments demonstrate significant improvements in detection accuracy, response latency, and resource efficiency over traditional security methods.
arXiv Detail & Related papers (2025-05-01T01:18:54Z) - LENS-XAI: Redefining Lightweight and Explainable Network Security through Knowledge Distillation and Variational Autoencoders for Scalable Intrusion Detection in Cybersecurity [0.0]
This study introduces the Lightweight Explainable Network Security framework (LENS-XAI)<n>LENS-XAI combines robust intrusion detection with enhanced interpretability and scalability.<n>This research contributes significantly to advancing IDS by addressing computational efficiency, feature interpretability, and real-world applicability.
arXiv Detail & Related papers (2025-01-01T10:00:49Z) - Automated Phishing Detection Using URLs and Webpages [35.66275851732625]
This project addresses the constraints of traditional reference-based phishing detection by developing an LLM agent framework.
This agent harnesses Large Language Models to actively fetch and utilize online information.
Our approach has achieved with accuracy of 0.945, significantly outperforms the existing solution(DynaPhish) by 0.445.
arXiv Detail & Related papers (2024-08-03T05:08:27Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.