Modelling Direct Messaging Networks with Multiple Recipients for Cyber
Deception
- URL: http://arxiv.org/abs/2111.11932v1
- Date: Sun, 21 Nov 2021 10:18:48 GMT
- Title: Modelling Direct Messaging Networks with Multiple Recipients for Cyber
Deception
- Authors: Kristen Moore, Cody J. Christopher, David Liebowitz, Surya Nepal,
Renee Selvey
- Abstract summary: We propose a framework to automate the generation of email and instant messaging-style group communications at scale.
We address two key aspects of simulating this type of system: modelling when and with whom participants communicate, and generating topical, multi-party text to populate simulated conversation threads.
We demonstrate the use of fine-tuned, pre-trained language models to generate convincing multi-party conversation threads.
- Score: 13.447335354083666
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Cyber deception is emerging as a promising approach to defending networks and
systems against attackers and data thieves. However, despite being relatively
cheap to deploy, the generation of realistic content at scale is very costly,
due to the fact that rich, interactive deceptive technologies are largely
hand-crafted. With recent improvements in Machine Learning, we now have the
opportunity to bring scale and automation to the creation of realistic and
enticing simulated content. In this work, we propose a framework to automate
the generation of email and instant messaging-style group communications at
scale. Such messaging platforms within organisations contain a lot of valuable
information inside private communications and document attachments, making them
an enticing target for an adversary. We address two key aspects of simulating
this type of system: modelling when and with whom participants communicate, and
generating topical, multi-party text to populate simulated conversation
threads. We present the LogNormMix-Net Temporal Point Process as an approach to
the first of these, building upon the intensity-free modeling approach of
Shchur et al.~\cite{shchur2019intensity} to create a generative model for
unicast and multi-cast communications. We demonstrate the use of fine-tuned,
pre-trained language models to generate convincing multi-party conversation
threads. A live email server is simulated by uniting our LogNormMix-Net TPP (to
generate the communication timestamp, sender and recipients) with the language
model, which generates the contents of the multi-party email threads. We
evaluate the generated content with respect to a number of realism-based
properties, that encourage a model to learn to generate content that will
engage the attention of an adversary to achieve a deception outcome.
Related papers
- Large Generative Model-assisted Talking-face Semantic Communication System [55.42631520122753]
This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system.
Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density.
Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction.
Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
arXiv Detail & Related papers (2024-11-06T12:45:46Z) - Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition [48.527630771422935]
We propose a synthetic data generation pipeline for multi-speaker conversational ASR.
We conduct evaluation by fine-tuning the Whisper ASR model for telephone and distant conversational speech settings.
arXiv Detail & Related papers (2024-08-17T14:47:05Z) - Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions [17.96479268328824]
We investigate the feasibility of creating realistic, large-scale synthetic datasets of user-generated content.
We propose a multi-step generation process, predicated on the idea of creating compact representations of discussion threads.
arXiv Detail & Related papers (2024-08-15T18:43:50Z) - Modeling Real-Time Interactive Conversations as Timed Diarized Transcripts [11.067252960486272]
We present a simple yet general method to simulate real-time interactive conversations using pretrained language models.
We demonstrate the promise of this method with two case studies: instant messenger dialogues and spoken conversations.
arXiv Detail & Related papers (2024-05-21T21:14:31Z) - TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild [102.93338424976959]
We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved instruction-following capabilities.
Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model.
To accommodate interleaved image-text inputs and outputs, we devise MIM, a language model-centric architecture that seamlessly integrates image encoder and decoder models.
arXiv Detail & Related papers (2023-09-14T15:34:01Z) - Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models.
Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z) - Grounding Language Models to Images for Multimodal Inputs and Outputs [89.30027812161686]
We propose an efficient method to ground pretrained text-only language models to the visual domain.
We process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images.
arXiv Detail & Related papers (2023-01-31T18:33:44Z) - Representation Learning for Conversational Data using Discourse Mutual
Information Maximization [9.017156603976915]
We argue that the structure-unaware word-by-word generation is not suitable for effective conversation modeling.
We propose a structure-aware Mutual Information based loss-function DMI for training dialog-representation models.
Our models show the most promising performance on the dialog evaluation task DailyDialog++, in both random and adversarial negative scenarios.
arXiv Detail & Related papers (2021-12-04T13:17:07Z) - Building Goal-Oriented Dialogue Systems with Situated Visual Context [12.014793558784955]
With the surge of virtual assistants with screen, the next generation of agents are required to understand screen context.
We propose a novel multimodal conversational framework, where the dialogue agent's next action and their arguments are derived jointly conditioned both on the conversational and the visual context.
Our model can recognize visual features such as color and shape as well as the metadata based features such as price or star rating associated with a visual entity.
arXiv Detail & Related papers (2021-11-22T23:30:52Z) - Dialog Simulation with Realistic Variations for Training Goal-Oriented
Conversational Systems [14.206866126142002]
Goal-oriented dialog systems enable users to complete specific goals like requesting information about a movie or booking a ticket.
We propose an approach for automatically creating a large corpus of annotated dialogs from a few thoroughly annotated sample dialogs and the dialog schema.
We achieve 18? 50% relative accuracy on a held-out test set compared to a baseline dialog generation approach.
arXiv Detail & Related papers (2020-11-16T19:39:15Z) - Plug-and-Play Conversational Models [62.77150879036442]
We introduce an approach that does not require further computation at decoding time, while also does not require any fine-tuning of a large language model.
We demonstrate, through extensive automatic and human evaluation, a high degree of control over the generated conversational responses with regard to multiple desired attributes.
arXiv Detail & Related papers (2020-10-09T03:17:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.