Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography
- URL: http://arxiv.org/abs/2405.09090v1
- Date: Wed, 15 May 2024 04:52:09 GMT
- Title: Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography
- Authors: Minhao Bai. Jinshuai Yang, Kaiyi Pang, Huili Wang, Yongfeng Huang,
- Abstract summary: Linguistic steganography provides convenient implementation to hide messages, particularly with the emergence of AI generation technology.
Existing methods are limited to finding distribution differences between steganographic texts and normal texts from the aspect of symbolic statistics.
This paper propose to employ human-like text processing abilities of large language models (LLMs) to realize the difference from the aspect of human perception.
- Score: 18.7168443402118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linguistic steganography provides convenient implementation to hide messages, particularly with the emergence of AI generation technology. The potential abuse of this technology raises security concerns within societies, calling for powerful linguistic steganalysis to detect carrier containing steganographic messages. Existing methods are limited to finding distribution differences between steganographic texts and normal texts from the aspect of symbolic statistics. However, the distribution differences of both kinds of texts are hard to build precisely, which heavily hurts the detection ability of the existing methods in realistic scenarios. To seek a feasible way to construct practical steganalysis in real world, this paper propose to employ human-like text processing abilities of large language models (LLMs) to realize the difference from the aspect of human perception, addition to traditional statistic aspect. Specifically, we systematically investigate the performance of LLMs in this task by modeling it as a generative paradigm, instead of traditional classification paradigm. Extensive experiment results reveal that generative LLMs exhibit significant advantages in linguistic steganalysis and demonstrate performance trends distinct from traditional approaches. Results also reveal that LLMs outperform existing baselines by a wide margin, and the domain-agnostic ability of LLMs makes it possible to train a generic steganalysis model (Both codes and trained models are openly available in https://github.com/ba0z1/Linguistic-Steganalysis-with-LLMs).
Related papers
- A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution [57.309390098903]
Authorship attribution aims to identify the origin or author of a document.
Large Language Models (LLMs) with their deep reasoning capabilities and ability to maintain long-range textual associations offer a promising alternative.
Our results on the IMDb and blog datasets show an impressive 85% accuracy in one-shot authorship classification across ten authors.
arXiv Detail & Related papers (2024-10-29T04:14:23Z) - Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework [9.976099891796784]
Large language models (LLMs) have transformed human writing by enhancing grammar correction, content expansion, and stylistic refinement.
Existing detection methods, which mainly rely on single-feature analysis and binary classification, often fail to effectively identify LLM-generated text in academic contexts.
We propose a novel Multi-level Fine-grained Detection framework that detects LLM-generated text by integrating low-level structural, high-level semantic, and deep-level linguistic features.
arXiv Detail & Related papers (2024-10-18T07:25:00Z) - Training-free LLM-generated Text Detection by Mining Token Probability Sequences [18.955509967889782]
Large language models (LLMs) have demonstrated remarkable capabilities in generating high-quality texts across diverse domains.
Training-free methods, which focus on inherent discrepancies through carefully designed statistical features, offer improved generalization and interpretability.
We introduce a novel training-free detector, termed textbfLastde that synergizes local and global statistics for enhanced detection.
arXiv Detail & Related papers (2024-10-08T14:23:45Z) - Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models [41.233879429714925]
This study critically examines the capacity of API-based large language models to comprehend phrase semantics.
We assess the performance of LLMs in executing phrase semantic reasoning tasks guided by natural language instructions.
We conduct detailed error analyses to interpret the limitations faced by LLMs in comprehending phrase semantics.
arXiv Detail & Related papers (2024-10-03T08:44:17Z) - Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs [0.41436032949434404]
We develop and rigorously evaluate new detection methods for issue framing and narrative analysis within large text datasets.
We show that issue framing can be reliably and efficiently detected in large corpora with only a few examples of either perspective on a given issue.
arXiv Detail & Related papers (2024-08-19T07:14:15Z) - ReMoDetect: Reward Models Recognize Aligned LLM's Generations [55.06804460642062]
Large language models (LLMs) generate human-preferable texts.
In this paper, we identify the common characteristics shared by these models.
We propose two training schemes to further improve the detection ability of the reward model.
arXiv Detail & Related papers (2024-05-27T17:38:33Z) - Zero-shot Causal Graph Extrapolation from Text via LLMs [50.596179963913045]
We evaluate the ability of large language models (LLMs) to infer causal relations from natural language.
LLMs show competitive performance in a benchmark of pairwise relations without needing (explicit) training samples.
We extend our approach to extrapolating causal graphs through iterated pairwise queries.
arXiv Detail & Related papers (2023-12-22T13:14:38Z) - Language Model Decoding as Direct Metrics Optimization [87.68281625776282]
Current decoding methods struggle to generate texts that align with human texts across different aspects.
In this work, we frame decoding from a language model as an optimization problem with the goal of strictly matching the expected performance with human texts.
We prove that this induced distribution is guaranteed to improve the perplexity on human texts, which suggests a better approximation to the underlying distribution of human texts.
arXiv Detail & Related papers (2023-10-02T09:35:27Z) - Neural Authorship Attribution: Stylometric Analysis on Large Language
Models [16.63955074133222]
Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text.
With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics.
arXiv Detail & Related papers (2023-08-14T17:46:52Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.