NLP Security and Ethics, in the Wild
- URL: http://arxiv.org/abs/2504.06669v1
- Date: Wed, 09 Apr 2025 08:12:34 GMT
- Title: NLP Security and Ethics, in the Wild
- Authors: Heather Lent, Erick Galinkin, Yiyi Chen, Jens Myrup Pedersen, Leon Derczynski, Johannes Bjerva,
- Abstract summary: Research ethics of NLPSec have not yet faced many of the long-standing conundrums pertinent to cybersecurity.<n>We identify trends across the literature, ultimately finding alarming gaps on topics like harm minimization and responsible disclosure.<n>The goal of this work is to help cultivate an intentional culture of ethical research for those working in NLP Security.
- Score: 13.034186226863579
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As NLP models are used by a growing number of end-users, an area of increasing importance is NLP Security (NLPSec): assessing the vulnerability of models to malicious attacks and developing comprehensive countermeasures against them. While work at the intersection of NLP and cybersecurity has the potential to create safer NLP for all, accidental oversights can result in tangible harm (e.g., breaches of privacy or proliferation of malicious models). In this emerging field, however, the research ethics of NLP have not yet faced many of the long-standing conundrums pertinent to cybersecurity, until now. We thus examine contemporary works across NLPSec, and explore their engagement with cybersecurity's ethical norms. We identify trends across the literature, ultimately finding alarming gaps on topics like harm minimization and responsible disclosure. To alleviate these concerns, we provide concrete recommendations to help NLP researchers navigate this space more ethically, bridging the gap between traditional cybersecurity and NLP ethics, which we frame as ``white hat NLP''. The goal of this work is to help cultivate an intentional culture of ethical research for those working in NLP Security.
Related papers
- A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations [127.52707312573791]
This survey provides a comprehensive analysis of LVLM safety, covering key aspects such as attacks, defenses, and evaluation methods.
We introduce a unified framework that integrates these interrelated components, offering a holistic perspective on the vulnerabilities of LVLMs.
We conduct a set of safety evaluations on the latest LVLM, Deepseek Janus-Pro, and provide a theoretical analysis of the results.
arXiv Detail & Related papers (2025-02-14T08:42:43Z) - Global Challenge for Safe and Secure LLMs Track 1 [57.08717321907755]
The Global Challenge for Safe and Secure Large Language Models (LLMs) is a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO)
This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks.
arXiv Detail & Related papers (2024-11-21T08:20:31Z) - Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science [65.77763092833348]
Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.
While their capabilities are promising, these agents also introduce novel vulnerabilities that demand careful consideration for safety.
This paper conducts a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures.
arXiv Detail & Related papers (2024-02-06T18:54:07Z) - The Ethics of Interaction: Mitigating Security Threats in LLMs [1.407080246204282]
The paper delves into the nuanced ethical repercussions of such security threats on society and individual privacy.
We scrutinize five major threats--prompt injection, jailbreaking, Personal Identifiable Information (PII) exposure, sexually explicit content, and hate-based content--to assess their critical ethical consequences and the urgency they create for robust defensive strategies.
arXiv Detail & Related papers (2024-01-22T17:11:37Z) - Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities.
We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z) - Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and
Vulnerabilities [14.684194175806203]
Large language models (LLMs) can be misused for fraud, impersonation, and the generation of malware.
We present a taxonomy describing the relationship between threats caused by the generative capabilities of LLMs, prevention measures intended to address such threats, and vulnerabilities arising from imperfect prevention measures.
arXiv Detail & Related papers (2023-08-24T14:45:50Z) - Examining risks of racial biases in NLP tools for child protective
services [78.81107364902958]
We focus on one such setting: child protective services (CPS)
Given well-established racial bias in this setting, we investigate possible ways deployed NLP is liable to increase racial disparities.
We document consistent algorithmic unfairness in NER models, possible algorithmic unfairness in coreference resolution models, and little evidence of exacerbated racial bias in risk prediction.
arXiv Detail & Related papers (2023-05-30T21:00:47Z) - Beyond the Safeguards: Exploring the Security Risks of ChatGPT [3.1981440103815717]
Increasing popularity of large language models (LLMs) has led to growing concerns about their safety, security risks, and ethical implications.
This paper aims to provide an overview of the different types of security risks associated with ChatGPT, including malicious text and code generation, private data disclosure, fraudulent services, information gathering, and producing unethical content.
arXiv Detail & Related papers (2023-05-13T21:01:14Z) - Why Should Adversarial Perturbations be Imperceptible? Rethink the
Research Paradigm in Adversarial NLP [83.66405397421907]
We rethink the research paradigm of textual adversarial samples in security scenarios.
We first collect, process, and release a security datasets collection Advbench.
Next, we propose a simple method based on rules that can easily fulfill the actual adversarial goals to simulate real-world attack methods.
arXiv Detail & Related papers (2022-10-19T15:53:36Z) - Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing [7.148585157154561]
We find that the Final Rule, the common ethical framework used by researchers, did not anticipate the use of online crowdsourcing platforms for data collection.
We enumerate common scenarios where crowdworkers performing NLP tasks are at risk of harm.
We recommend that researchers evaluate these risks by considering the three ethical principles set up by the Belmont Report.
arXiv Detail & Related papers (2021-04-20T16:30:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.