When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text
- URL: http://arxiv.org/abs/2506.09975v2
- Date: Mon, 16 Jun 2025 13:31:25 GMT
- Title: When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text
- Authors: Hillary Dawkins, Kathleen C. Fraser, Svetlana Kiritchenko,
- Abstract summary: Social media represents a significant attack vector in online influence campaigns.<n>We create a dataset of 505,159 AI-generated social media posts from a combination of open-source, closed-source, and fine-tuned LLMs.<n>We show that while the posts can be detected under typical research assumptions, under the more realistic assumption that an attacker will not release their fine-tuned model to the public, detectability drops dramatically.
- Score: 13.14749943120523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting AI-generated text is a difficult problem to begin with; detecting AI-generated text on social media is made even more difficult due to the short text length and informal, idiosyncratic language of the internet. It is nonetheless important to tackle this problem, as social media represents a significant attack vector in online influence campaigns, which may be bolstered through the use of mass-produced AI-generated posts supporting (or opposing) particular policies, decisions, or events. We approach this problem with the mindset and resources of a reasonably sophisticated threat actor, and create a dataset of 505,159 AI-generated social media posts from a combination of open-source, closed-source, and fine-tuned LLMs, covering 11 different controversial topics. We show that while the posts can be detected under typical research assumptions about knowledge of and access to the generating models, under the more realistic assumption that an attacker will not release their fine-tuned model to the public, detectability drops dramatically. This result is confirmed with a human study. Ablation experiments highlight the vulnerability of various detection algorithms to fine-tuned LLMs. This result has implications across all detection domains, since fine-tuning is a generally applicable and realistic LLM use case.
Related papers
- Passing the Turing Test in Political Discourse: Fine-Tuning LLMs to Mimic Polarized Social Media Comments [0.0]
This study explores the extent to which fine-tuned large language models (LLMs) can replicate and amplify polarizing discourse.<n>Using a curated dataset of politically charged discussions extracted from Reddit, we fine-tune an open-source LLM to produce context-aware and ideologically aligned responses.<n>The results indicate that, when trained on partisan data, LLMs are capable of producing highly plausible and provocative comments, often indistinguishable from those written by humans.
arXiv Detail & Related papers (2025-06-17T15:41:26Z) - Could AI Trace and Explain the Origins of AI-Generated Images and Text? [53.11173194293537]
AI-generated content is increasingly prevalent in the real world.<n> adversaries might exploit large multimodal models to create images that violate ethical or legal standards.<n>Paper reviewers may misuse large language models to generate reviews without genuine intellectual effort.
arXiv Detail & Related papers (2025-04-05T20:51:54Z) - Almost AI, Almost Human: The Challenge of Detecting AI-Polished Writing [55.2480439325792]
This study systematically evaluations twelve state-of-the-art AI-text detectors using our AI-Polished-Text Evaluation dataset.<n>Our findings reveal that detectors frequently flag even minimally polished text as AI-generated, struggle to differentiate between degrees of AI involvement, and exhibit biases against older and smaller models.
arXiv Detail & Related papers (2025-02-21T18:45:37Z) - Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods [13.14749943120523]
Knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness.<n>State-of-the art approaches to AIGT detection include watermarking, statistical and stylistic analysis, and machine learning classification.<n>We aim to provide insight into the salient factors that combine to determine how "detectable" AIGT text is under different scenarios.
arXiv Detail & Related papers (2024-06-21T18:31:49Z) - Modes of Analyzing Disinformation Narratives With AI/ML/Text Mining to Assist in Mitigating the Weaponization of Social Media [0.8287206589886879]
This paper highlights the developing need for quantitative modes for capturing and monitoring malicious communication in social media.
There has been a deliberate "weaponization" of messaging through the use of social networks including by politically oriented entities both state sponsored and privately run.
Despite attempts to introduce moderation on major platforms like Facebook and X/Twitter, there are now established alternative social networks that offer completely unmoderated spaces.
arXiv Detail & Related papers (2024-05-25T00:02:14Z) - Towards Possibilities & Impossibilities of AI-generated Text Detection:
A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses.
Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs.
To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z) - Who Said That? Benchmarking Social Media AI Detection [12.862865254507177]
This paper introduces SAID (Social media AI Detection), a novel benchmark developed to assess AI-text detection models' capabilities in real social media platforms.
It incorporates real AI-generate text from popular social media platforms like Zhihu and Quora.
A notable finding of our study, based on the Zhihu dataset, reveals that annotators can distinguish between AI-generated and human-generated texts with an average accuracy rate of 96.5%.
arXiv Detail & Related papers (2023-10-12T11:35:24Z) - Factuality Challenges in the Era of Large Language Models [113.3282633305118]
Large Language Models (LLMs) generate false, erroneous, or misleading content.
LLMs can be exploited for malicious applications.
This poses a significant challenge to society in terms of the potential deception of users.
arXiv Detail & Related papers (2023-10-08T14:55:02Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.<n>The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.<n>We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.