Related papers: Promoting Online Safety by Simulating Unsafe Conversations with LLMs

Promoting Online Safety by Simulating Unsafe Conversations with LLMs

URL: http://arxiv.org/abs/2507.22267v1
Date: Tue, 29 Jul 2025 22:38:21 GMT
Title: Promoting Online Safety by Simulating Unsafe Conversations with LLMs
Authors: Owen Hoffman, Kangze Peng, Zehua You, Sajid Kamal, Sukrit Venkatagiri,
Abstract summary: Large language models (LLMs) have the potential -- and already are being used -- to increase the speed, scale, and types of unsafe conversations online.<n>In our current work, we explore ways to promote online safety by teaching people about unsafe conversations that can occur online with and without LLMs.
Score: 1.7243216387069678
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Generative AI, including large language models (LLMs) have the potential -- and already are being used -- to increase the speed, scale, and types of unsafe conversations online. LLMs lower the barrier for entry for bad actors to create unsafe conversations in particular because of their ability to generate persuasive and human-like text. In our current work, we explore ways to promote online safety by teaching people about unsafe conversations that can occur online with and without LLMs. We build on prior work that shows that LLMs can successfully simulate scam conversations. We also leverage research in the learning sciences that shows that providing feedback on one's hypothetical actions can promote learning. In particular, we focus on simulating scam conversations using LLMs. Our work incorporates two LLMs that converse with each other to simulate realistic, unsafe conversations that people may encounter online between a scammer LLM and a target LLM but users of our system are asked provide feedback to the target LLM.

Related papers

The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions [0.4605116997238364]
Large Language Models (LLMs) offer new avenues to simulate online communities and social media.<n>We evaluated whether LLMs can convincingly mimic human group conversations on social media.
arXiv Detail & Related papers (2025-10-29T17:01:20Z)
Sword and Shield: Uses and Strategies of LLMs in Navigating Disinformation [9.761926423405617]
Large Language Models (LLMs) can be weaponised to produce sophisticated and persuasive disinformation, yet they also hold promise for enhancing detection and mitigation strategies.<n>This paper investigates the complex dynamics between LLMs and disinformation through a communication game that simulates online forums, inspired by the game Werewolf, with 25 participants.<n>Our findings highlight the varying uses of LLMs depending on the participants' roles and strategies, underscoring the importance of understanding their effectiveness in this context.
arXiv Detail & Related papers (2025-06-08T16:24:11Z)
Can a large language model be a gaslighter? [18.39951259823815]
Large language models (LLMs) have gained human trust due to their capabilities and helpfulness. This in turn may allow LLMs to affect users' mindsets by manipulating language. In this work, we aim to investigate the vulnerability of LLMs under prompt-based and fine-tuning-based gaslighting attacks.
arXiv Detail & Related papers (2024-10-11T18:35:27Z)
Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue [10.101013733390532]
Large Language Models (LLMs) have been demonstrated to generate illegal or unethical responses. This paper argues that humans could exploit multi-turn dialogue to induce LLMs into generating harmful information.
arXiv Detail & Related papers (2024-02-27T07:11:59Z)
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning [67.39585115936329]
We argue that LLMs have inherent capabilities to handle long contexts without fine-tuning. We propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length.
arXiv Detail & Related papers (2024-01-02T18:30:51Z)
Boosting Large Language Model for Speech Synthesis: An Empirical Study [86.89548753080432]
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. We conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E. We compare three integration methods between LLMs and speech models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder
arXiv Detail & Related papers (2023-12-30T14:20:04Z)
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly [21.536079040559517]
Large Language Models (LLMs) have revolutionized natural language understanding and generation. This paper explores the intersection of LLMs with security and privacy.
arXiv Detail & Related papers (2023-12-04T16:25:18Z)
Negotiating with LLMS: Prompt Hacks, Skill Gaps, and Reasoning Deficits [1.2818275315985972]
We conduct a user study engaging over 40 individuals across all age groups in price negotiations with an LLM. We show that the negotiated prices humans manage to achieve span a broad range, which points to a literacy gap in effectively interacting with LLMs.
arXiv Detail & Related papers (2023-11-26T08:44:58Z)
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models. It achieves consistent and correct step-wise prompts in zero-shot scenarios. We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z)
In-Context Impersonation Reveals Large Language Models' Strengths and Biases [56.61129643802483]
We ask LLMs to assume different personas before solving vision and language tasks. We find that LLMs pretending to be children of different ages recover human-like developmental stages. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts.
arXiv Detail & Related papers (2023-05-24T09:13:15Z)
Multi-step Jailbreaking Privacy Attacks on ChatGPT [47.10284364632862]
We study the privacy threats from OpenAI's ChatGPT and the New Bing enhanced by ChatGPT. We conduct extensive experiments to support our claims and discuss LLMs' privacy implications.
arXiv Detail & Related papers (2023-04-11T13:05:04Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.