Related papers: Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models

URL: http://arxiv.org/abs/2406.05364v1
Date: Sat, 8 Jun 2024 05:45:42 GMT
Title: Is On-Device AI Broken and Exploitable? Assessing the Trust and Ethics in Small Language Models
Authors: Kalyan Nakka, Jimmy Dani, Nitesh Saxena,
Abstract summary: We present a first study to investigate trust and ethical implications of on-device artificial intelligence (AI) We focus on ''small'' language models (SLMs) amenable for personal devices like smartphones. Our results show on-device SLMs to be significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior.
Score: 1.5953412143328967
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present a very first study to investigate trust and ethical implications of on-device artificial intelligence (AI), focusing on ''small'' language models (SLMs) amenable for personal devices like smartphones. While on-device SLMs promise enhanced privacy, reduced latency, and improved user experience compared to cloud-based services, we posit that they might also introduce significant challenges and vulnerabilities compared to on-server counterparts. As part of our trust assessment study, we conduct a systematic evaluation of the state-of-the-art on-devices SLMs, contrasted to their on-server counterparts, based on a well-established trustworthiness measurement framework. Our results show on-device SLMs to be (statistically) significantly less trustworthy, specifically demonstrating more stereotypical, unfair and privacy-breaching behavior. Informed by these findings, we then perform our ethics assessment study by inferring whether SLMs would provide responses to potentially unethical vanilla prompts, collated from prior jailbreaking and prompt engineering studies and other sources. Strikingly, the on-device SLMs did answer valid responses to these prompts, which ideally should be rejected. Even more seriously, the on-device SLMs responded with valid answers without any filters and without the need for any jailbreaking or prompt engineering. These responses can be abused for various harmful and unethical scenarios including: societal harm, illegal activities, hate, self-harm, exploitable phishing content and exploitable code, all of which indicates the high vulnerability and exploitability of these on-device SLMs. Overall, our findings highlight gaping vulnerabilities in state-of-the-art on-device AI which seem to stem from resource constraints faced by these models and which may make typical defenses fundamentally challenging to be deployed in these environments.

Related papers

SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models [50.34706204154244]
Acquiring reasoning capabilities catastrophically degrades inherited safety alignment. Certain scenarios suffer 25 times higher attack rates. Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction.
arXiv Detail & Related papers (2025-04-09T06:53:23Z)
Internal Activation as the Polar Star for Steering Unsafe LLM Behavior [50.463399903987245]
We introduce SafeSwitch, a framework that dynamically regulates unsafe outputs by monitoring and utilizing the model's internal states. Our empirical results show that SafeSwitch reduces harmful outputs by over 80% on safety benchmarks while maintaining strong utility.
arXiv Detail & Related papers (2025-02-03T04:23:33Z)
Retention Score: Quantifying Jailbreak Risks for Vision Language Models [60.48306899271866]
Vision-Language Models (VLMs) are integrated with Large Language Models (LLMs) to enhance multi-modal machine learning capabilities. This paper aims to assess the resilience of VLMs against jailbreak attacks that can compromise model safety compliance and result in harmful outputs. To evaluate a VLM's ability to maintain its robustness against adversarial input perturbations, we propose a novel metric called the textbfRetention Score.
arXiv Detail & Related papers (2024-12-23T13:05:51Z)
VMGuard: Reputation-Based Incentive Mechanism for Poisoning Attack Detection in Vehicular Metaverse [52.57251742991769]
vehicular Metaverse guard (VMGuard) protects vehicular Metaverse systems from data poisoning attacks. VMGuard implements a reputation-based incentive mechanism to assess the trustworthiness of participating SIoT devices. Our system ensures that reliable SIoT devices, previously missclassified, are not barred from participating in future rounds of the market.
arXiv Detail & Related papers (2024-12-05T17:08:20Z)
In-Context Experience Replay Facilitates Safety Red-Teaming of Text-to-Image Diffusion Models [97.82118821263825]
Text-to-image (T2I) models have shown remarkable progress, but their potential to generate harmful content remains a critical concern in the ML community. We propose ICER, a novel red-teaming framework that generates interpretable and semantic meaningful problematic prompts. Our work provides crucial insights for developing more robust safety mechanisms in T2I systems.
arXiv Detail & Related papers (2024-11-25T04:17:24Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes [0.0]
Large Language Models (LLMs) have gained widespread adoption across various domains, including chatbots and auto-task completion agents. These models are susceptible to safety vulnerabilities such as jailbreaking, prompt injection, and privacy leakage attacks. This study investigates the impact of these modifications on LLM safety, a critical consideration for building reliable and secure AI systems.
arXiv Detail & Related papers (2024-04-05T20:31:45Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics [54.57914943017522]
We highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications.
arXiv Detail & Related papers (2024-02-15T22:01:45Z)
Is Your Kettle Smarter Than a Hacker? A Scalable Tool for Assessing Replay Attack Vulnerabilities on Consumer IoT Devices [1.5612101323427952]
ENISA and NIST security guidelines emphasize the importance of enabling default local communication for safety and reliability. We propose a tool, named REPLIOT, able to test whether a replay attack is successful or not, without prior knowledge of the target devices. We find that 75% of the remaining devices are vulnerable to replay attacks with REPLIOT having a detection accuracy of 0.98-1.
arXiv Detail & Related papers (2024-01-22T18:24:41Z)
The Ethics of Interaction: Mitigating Security Threats in LLMs [1.407080246204282]
The paper delves into the nuanced ethical repercussions of such security threats on society and individual privacy. We scrutinize five major threats--prompt injection, jailbreaking, Personal Identifiable Information (PII) exposure, sexually explicit content, and hate-based content--to assess their critical ethical consequences and the urgency they create for robust defensive strategies.
arXiv Detail & Related papers (2024-01-22T17:11:37Z)
SODA: Protecting Proprietary Information in On-Device Machine Learning Models [5.352699766206808]
We present an end-to-end framework, SODA, for deploying and serving on edge devices while defending against adversarial usage. Our results demonstrate that SODA can detect adversarial usage with 89% accuracy in less than 50 queries with minimal impact on service performance, latency, and storage.
arXiv Detail & Related papers (2023-12-22T20:04:36Z)
A Novel IoT Trust Model Leveraging Fully Distributed Behavioral Fingerprinting and Secure Delegation [3.10770247120758]
Internet of Things (IoT) solutions are experimenting a booming demand to make data collection and processing easier. The higher the number of new capabilities and services provided in an autonomous way, the wider the attack surface that exposes users to data hacking and lost. In this paper, we try to provide a contribution in this setting, tackling the non-trivial issues of equipping smart things with a strategy to evaluate, also through their neighbors, the trustworthiness of an object in the network before interacting with it.
arXiv Detail & Related papers (2023-10-02T07:45:49Z)
Vulnerability of Machine Learning Approaches Applied in IoT-based Smart Grid: A Review [51.31851488650698]
Machine learning (ML) sees an increasing prevalence of being used in the internet-of-things (IoT)-based smart grid. adversarial distortion injected into the power signal will greatly affect the system's normal control and operation. It is imperative to conduct vulnerability assessment for MLsgAPPs applied in the context of safety-critical power systems.
arXiv Detail & Related papers (2023-08-30T03:29:26Z)
Getting pwn'd by AI: Penetration Testing with Large Language Models [0.0]
This paper explores the potential usage of large-language models, such as GPT3.5, to augment penetration testers with AI sparring partners. We explore the feasibility of supplementing penetration testers with AI models for two distinct use cases: high-level task planning for security testing assignments and low-level vulnerability hunting within a vulnerable virtual machine.
arXiv Detail & Related papers (2023-07-24T19:59:22Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)
Is this IoT Device Likely to be Secure? Risk Score Prediction for IoT Devices Using Gradient Boosting Machines [11.177584118932572]
Security risk assessment and prediction are critical for organisations deploying Internet of Things (IoT) devices. This paper proposes a novel risk prediction for IoT devices based on publicly available information about them.
arXiv Detail & Related papers (2021-11-23T13:41:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.