Modeling Coherency in Generated Emails by Leveraging Deep Neural
Learners
- URL: http://arxiv.org/abs/2007.07403v1
- Date: Tue, 14 Jul 2020 23:47:08 GMT
- Title: Modeling Coherency in Generated Emails by Leveraging Deep Neural
Learners
- Authors: Avisha Das and Rakesh M. Verma
- Abstract summary: Advanced machine learning and natural language techniques enable attackers to launch sophisticated and targeted social engineering-based attacks.
Email masquerading using targeted emails to fool the victim is an advanced attack method.
We demonstrate the generation of short and targeted text messages using the deep model.
- Score: 6.891238879512674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Advanced machine learning and natural language techniques enable attackers to
launch sophisticated and targeted social engineering-based attacks. To counter
the active attacker issue, researchers have since resorted to proactive methods
of detection. Email masquerading using targeted emails to fool the victim is an
advanced attack method. However automatic text generation requires controlling
the context and coherency of the generated content, which has been identified
as an increasingly difficult problem. The method used leverages a hierarchical
deep neural model which uses a learned representation of the sentences in the
input document to generate structured written emails. We demonstrate the
generation of short and targeted text messages using the deep model. The global
coherency of the synthesized text is evaluated using a qualitative study as
well as multiple quantitative measures.
Related papers
- Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models [35.67613230687864]
Large Language Models (LLMs) are trained at scale and endowed with powerful text-generating abilities.
We propose a new, theoretically grounded approach to combine their respective strengths.
Our experiments, using a variety of generator LLMs, suggest that our method effectively increases the robustness of detection.
arXiv Detail & Related papers (2024-09-11T20:55:12Z) - Detecting, Explaining, and Mitigating Memorization in Diffusion Models [49.438362005962375]
We introduce a straightforward yet effective method for detecting memorized prompts by inspecting the magnitude of text-conditional predictions.
Our proposed method seamlessly integrates without disrupting sampling algorithms, and delivers high accuracy even at the first generation step.
Building on our detection strategy, we unveil an explainable approach that shows the contribution of individual words or tokens to memorization.
arXiv Detail & Related papers (2024-07-31T16:13:29Z) - Prompted Contextual Vectors for Spear-Phishing Detection [45.07804966535239]
Spear-phishing attacks present a significant security challenge.
We propose a detection approach based on a novel document vectorization method.
Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails.
arXiv Detail & Related papers (2024-02-13T09:12:55Z) - Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using
Clustering and Information Theory Techniques [0.0]
We propose a bot identification algorithm based on unsupervised learning techniques.
We find that the generated texts tend to be more chaotic while literary works are more complex.
We also demonstrate that the clustering of human texts results in fuzzier clusters in comparison to the more compact and well-separated clusters of bot-generated texts.
arXiv Detail & Related papers (2023-11-19T22:29:15Z) - Reverse-Engineering Decoding Strategies Given Blackbox Access to a
Language Generation System [73.52878118434147]
We present methods to reverse-engineer the decoding method used to generate text.
Our ability to discover which decoding strategy was used has implications for detecting generated text.
arXiv Detail & Related papers (2023-09-09T18:19:47Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Detecting Textual Adversarial Examples Based on Distributional
Characteristics of Data Representations [11.93653349589025]
adversarial examples are constructed by adding small non-random perturbations to correctly classified inputs.
Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, or phrase-level perturbations.
We propose two new reactive methods for NLP to fill this gap.
Adapted LID and MDRE obtain state-of-the-art results on character-level, word-level, and phrase-level attacks on the IMDB dataset.
arXiv Detail & Related papers (2022-04-29T02:32:02Z) - Deep convolutional forest: a dynamic deep ensemble approach for spam
detection in text [219.15486286590016]
This paper introduces a dynamic deep ensemble model for spam detection that adjusts its complexity and extracts features automatically.
As a result, the model achieved high precision, recall, f1-score and accuracy of 98.38%.
arXiv Detail & Related papers (2021-10-10T17:19:37Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - PALM: Pre-training an Autoencoding&Autoregressive Language Model for
Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation.
This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus.
An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.