Related papers: Trustworthy AI: Securing Sensitive Data in Large Language Models

Trustworthy AI: Securing Sensitive Data in Large Language Models

URL: http://arxiv.org/abs/2409.18222v1
Date: Thu, 26 Sep 2024 19:02:33 GMT
Title: Trustworthy AI: Securing Sensitive Data in Large Language Models
Authors: Georgios Feretzakis, Vassilios S. Verykios,
Abstract summary: Large Language Models (LLMs) have transformed natural language processing (NLP) by enabling robust text generation and understanding. This paper proposes a comprehensive framework for embedding trust mechanisms into LLMs to dynamically control the disclosure of sensitive information.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have transformed natural language processing (NLP) by enabling robust text generation and understanding. However, their deployment in sensitive domains like healthcare, finance, and legal services raises critical concerns about privacy and data security. This paper proposes a comprehensive framework for embedding trust mechanisms into LLMs to dynamically control the disclosure of sensitive information. The framework integrates three core components: User Trust Profiling, Information Sensitivity Detection, and Adaptive Output Control. By leveraging techniques such as Role-Based Access Control (RBAC), Attribute-Based Access Control (ABAC), Named Entity Recognition (NER), contextual analysis, and privacy-preserving methods like differential privacy, the system ensures that sensitive information is disclosed appropriately based on the user's trust level. By focusing on balancing data utility and privacy, the proposed solution offers a novel approach to securely deploying LLMs in high-risk environments. Future work will focus on testing this framework across various domains to evaluate its effectiveness in managing sensitive data while maintaining system efficiency.

Related papers

Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation [60.81109086640437]
We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG) FedE4RAG facilitates collaborative training of client-side RAG retrieval models. We apply homomorphic encryption within federated learning to safeguard model parameters.
arXiv Detail & Related papers (2025-04-27T04:26:02Z)
Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval [7.412110686946628]
This paper proposes an advanced encryption methodology designed to protect RAG systems from unauthorized access and data leakage. Our approach encrypts both textual content and its corresponding embeddings prior to storage, ensuring that all data remains securely encrypted. Our findings suggest that integrating advanced encryption techniques into the design and deployment of RAG systems can effectively enhance privacy safeguards.
arXiv Detail & Related papers (2025-03-17T07:45:05Z)
Deploying Privacy Guardrails for LLMs: A Comparative Analysis of Real-World Applications [3.1810537478232406]
OneShield is a framework designed to mitigate privacy risks in user inputs and LLM outputs across enterprise and open-source settings. We analyze two real-world deployments, focusing on enterprise-scale data governance. OneShield achieved a 0.95 F1 score in detecting sensitive entities across 26 languages, outperforming state-of-the-art tools.
arXiv Detail & Related papers (2025-01-21T19:04:53Z)
Powering LLM Regulation through Data: Bridging the Gap from Compute Thresholds to Customer Experiences [0.0]
This paper argues that current regulatory approaches, which focus on compute-level thresholds and generalized model evaluations, are insufficient to ensure the safety and effectiveness of specific LLM-based user experiences. We propose a shift towards a certification process centered on actual user-facing experiences and the curation of high-quality datasets for evaluation.
arXiv Detail & Related papers (2025-01-12T16:20:40Z)
Privacy-Preserving Customer Support: A Framework for Secure and Scalable Interactions [0.0]
This paper introduces the Privacy-Preserving Zero-Shot Learning (PP-ZSL) framework, a novel approach leveraging large language models (LLMs) in a zero-shot learning mode. Unlike conventional machine learning methods, PP-ZSL eliminates the need for local training on sensitive data by utilizing pre-trained LLMs to generate responses directly. The framework incorporates real-time data anonymization to redact or mask sensitive information, retrieval-augmented generation (RAG) for domain-specific query resolution, and robust post-processing to ensure compliance with regulatory standards.
arXiv Detail & Related papers (2024-12-10T17:20:47Z)
Data Obfuscation through Latent Space Projection (LSP) for Privacy-Preserving AI Governance: Case Studies in Medical Diagnosis and Finance Fraud Detection [0.0]
This paper introduces Data Obfuscation through Latent Space Projection (LSP), a novel technique aimed at enhancing AI governance and ensuring Responsible AI compliance. LSP uses machine learning to project sensitive data into a latent space, effectively obfuscating it while preserving essential features for model training and inference. We validate LSP's effectiveness through experiments on benchmark datasets and two real-world case studies: healthcare cancer diagnosis and financial fraud analysis.
arXiv Detail & Related papers (2024-10-22T22:31:03Z)
Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning [9.719986610417441]
Guardrails have become an integral part of Large language models (LLMs) This study introduces an adaptive guardrail mechanism, supported by trust modeling and enhanced with in-context learning. By leveraging a combination of direct interaction trust and authority-verified trust, the system precisely tailors the strictness of content moderation to align with the user's credibility.
arXiv Detail & Related papers (2024-08-16T18:07:48Z)
Complete Security and Privacy for AI Inference in Decentralized Systems [14.526663289437584]
Large models are crucial for tasks like diagnosing diseases but tend to be delicate and not very scalable. Nesa solves these challenges with a comprehensive framework using multiple techniques to protect data and model outputs. Nesa's state-of-the-art proofs and principles demonstrate the framework's effectiveness.
arXiv Detail & Related papers (2024-07-28T05:09:17Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit. We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z)
Data Collaboration Analysis Over Matrix Manifolds [0.0]
Privacy-Preserving Machine Learning (PPML) addresses this challenge by safeguarding sensitive information. NRI-DC framework emerges as an innovative approach, potentially resolving the 'data island' issue among institutions. This study establishes a rigorous theoretical foundation for these collaboration functions and introduces new formulations.
arXiv Detail & Related papers (2024-03-05T08:52:16Z)
State-of-the-Art Approaches to Enhancing Privacy Preservation of Machine Learning Datasets: A Survey [0.0]
This paper examines the evolving landscape of machine learning (ML) and its profound impact across various sectors. It focuses on the emerging field of Privacy-preserving Machine Learning (PPML) As ML applications become increasingly integral to industries like telecommunications, financial technology, and surveillance, they raise significant privacy concerns.
arXiv Detail & Related papers (2024-02-25T17:31:06Z)
Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing. Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z)
Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data. FL and related techniques are often described as privacy-preserving. We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
CryptoSPN: Privacy-preserving Sum-Product Network Inference [84.88362774693914]
We present a framework for privacy-preserving inference of sum-product networks (SPNs) CryptoSPN achieves highly efficient and accurate inference in the order of seconds for medium-sized SPNs.
arXiv Detail & Related papers (2020-02-03T14:49:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.