User-Centered Security in Natural Language Processing
- URL: http://arxiv.org/abs/2301.04230v1
- Date: Tue, 10 Jan 2023 22:34:19 GMT
- Title: User-Centered Security in Natural Language Processing
- Authors: Chris Emmery
- Abstract summary: dissertation proposes a framework of user-centered security in Natural Language Processing (NLP)
It focuses on two security domains within NLP with great public interest.
- Score: 0.7106986689736825
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This dissertation proposes a framework of user-centered security in Natural
Language Processing (NLP), and demonstrates how it can improve the
accessibility of related research. Accordingly, it focuses on two security
domains within NLP with great public interest. First, that of author profiling,
which can be employed to compromise online privacy through invasive inferences.
Without access and detailed insight into these models' predictions, there is no
reasonable heuristic by which Internet users might defend themselves from such
inferences. Secondly, that of cyberbullying detection, which by default
presupposes a centralized implementation; i.e., content moderation across
social platforms. As access to appropriate data is restricted, and the nature
of the task rapidly evolves (both through lexical variation, and cultural
shifts), the effectiveness of its classifiers is greatly diminished and thereby
often misrepresented.
Under the proposed framework, we predominantly investigate the use of
adversarial attacks on language; i.e., changing a given input (generating
adversarial samples) such that a given model does not function as intended.
These attacks form a common thread between our user-centered security problems;
they are highly relevant for privacy-preserving obfuscation methods against
author profiling, and adversarial samples might also prove useful to assess the
influence of lexical variation and augmentation on cyberbullying detection.
Related papers
- Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.
We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.
Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - Watching the AI Watchdogs: A Fairness and Robustness Analysis of AI Safety Moderation Classifiers [5.35599092568615]
Safety Moderation (ASM) classifiers are designed to moderate content on social media platforms.
It is crucial to ensure that these classifiers do not unfairly classify content belonging to users from minority groups.
We thus examine the fairness and robustness of four widely-used, closed-source ASM classifiers.
arXiv Detail & Related papers (2025-01-23T01:04:00Z) - Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing
Security in Large Language Models [3.9490749767170636]
Large language models (LLMs) have revolutionized text generation, translation, and question-answering tasks.
Despite their widespread use, LLMs present challenges such as ethical dilemmas when models are compelled to respond inappropriately.
This paper addresses these challenges by introducing a multi-pronged approach that includes: 1) filtering sensitive vocabulary from user input to prevent unethical responses; 2) detecting role-playing to halt interactions that could lead to 'prison break' scenarios; and 4) extending these methodologies to various LLM derivatives like Multi-Model Large Language Models (MLLMs)
arXiv Detail & Related papers (2024-01-27T08:09:33Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - PROFL: A Privacy-Preserving Federated Learning Method with Stringent
Defense Against Poisoning Attacks [2.6487166137163007]
Federated Learning (FL) faces two major issues: privacy leakage and poisoning attacks.
We propose a novel privacy-preserving Byzantine-robust FL framework PROFL.
PROFL is based on the two-trapdoor additional homomorphic encryption algorithm and blinding techniques.
arXiv Detail & Related papers (2023-12-02T06:34:37Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z) - On the Privacy Risks of Algorithmic Recourse [17.33484111779023]
We make the first attempt at investigating if and how an adversary can leverage recourses to infer private information about the underlying model's training data.
Our work establishes unintended privacy leakage as an important risk in the widespread adoption of recourse methods.
arXiv Detail & Related papers (2022-11-10T09:04:24Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.