Deceptive Deletions for Protecting Withdrawn Posts on Social Platforms
- URL: http://arxiv.org/abs/2005.14113v1
- Date: Thu, 28 May 2020 16:08:33 GMT
- Title: Deceptive Deletions for Protecting Withdrawn Posts on Social Platforms
- Authors: Mohsen Minaei, S Chandra Mouli, Mainack Mondal, Bruno Ribeiro, Aniket
Kate
- Abstract summary: We introduce Deceptive Deletion, a decoy mechanism that minimizes the adversarial advantage.
We show that a powerful global adversary can be beaten by a powerful challenger.
- Score: 21.924023700334065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over-sharing poorly-worded thoughts and personal information is prevalent on
online social platforms. In many of these cases, users regret posting such
content. To retrospectively rectify these errors in users' sharing decisions,
most platforms offer (deletion) mechanisms to withdraw the content, and social
media users often utilize them. Ironically and perhaps unfortunately, these
deletions make users more susceptible to privacy violations by malicious actors
who specifically hunt post deletions at large scale. The reason for such
hunting is simple: deleting a post acts as a powerful signal that the post
might be damaging to its owner. Today, multiple archival services are already
scanning social media for these deleted posts. Moreover, as we demonstrate in
this work, powerful machine learning models can detect damaging deletions at
scale.
Towards restraining such a global adversary against users' right to be
forgotten, we introduce Deceptive Deletion, a decoy mechanism that minimizes
the adversarial advantage. Our mechanism injects decoy deletions, hence
creating a two-player minmax game between an adversary that seeks to classify
damaging content among the deleted posts and a challenger that employs decoy
deletions to masquerade real damaging deletions. We formalize the Deceptive
Game between the two players, determine conditions under which either the
adversary or the challenger provably wins the game, and discuss the scenarios
in-between these two extremes. We apply the Deceptive Deletion mechanism to a
real-world task on Twitter: hiding damaging tweet deletions. We show that a
powerful global adversary can be beaten by a powerful challenger, raising the
bar significantly and giving a glimmer of hope in the ability to be really
forgotten on social platforms.
Related papers
- Can Sensitive Information Be Deleted From LLMs? Objectives for Defending
Against Extraction Attacks [73.53327403684676]
We propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights.
We study direct edits to model weights because this approach should guarantee that particular deleted information is never extracted by future prompt attacks.
We show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
arXiv Detail & Related papers (2023-09-29T17:12:43Z) - User Identity Linkage in Social Media Using Linguistic and Social
Interaction Features [11.781485566149994]
User identity linkage aims to reveal social media accounts likely to belong to the same natural person.
This work proposes a machine learning-based detection model, which uses multiple attributes of users' online activity.
The models efficacy is demonstrated on two cases on abusive and terrorism-related Twitter content.
arXiv Detail & Related papers (2023-08-22T15:10:38Z) - Detecting and Reasoning of Deleted Tweets before they are Posted [5.300190188468289]
We identify deleted tweets, particularly within the Arabic context, and label them with a corresponding fine-grained disinformation category.
We then develop models that can predict the potentiality of tweets getting deleted, as well as the potential reasons behind deletion.
arXiv Detail & Related papers (2023-05-05T08:25:07Z) - Classification of social media Toxic comments using Machine learning
models [0.0]
The abstract outlines the problem of toxic comments on social media platforms, where individuals use disrespectful, abusive, and unreasonable language.
This behavior is referred to as anti-social behavior, which occurs during online debates, comments, and fights.
The comments containing explicit language can be classified into various categories, such as toxic, severe toxic, obscene, threat, insult, and identity hate.
To protect users from offensive language, companies have started flagging comments and blocking users.
arXiv Detail & Related papers (2023-04-14T05:40:11Z) - Explainable Abuse Detection as Intent Classification and Slot Filling [66.80201541759409]
We introduce the concept of policy-aware abuse detection, abandoning the unrealistic expectation that systems can reliably learn which phenomena constitute abuse from inspecting the data alone.
We show how architectures for intent classification and slot filling can be used for abuse detection, while providing a rationale for model decisions.
arXiv Detail & Related papers (2022-10-06T03:33:30Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - "Nice Try, Kiddo": Investigating Ad Hominems in Dialogue Responses [87.89632038677912]
Ad hominem attacks are those that target some feature of a person's character instead of the position the person is maintaining.
We propose categories of ad hominems, compose an annotated dataset, and build a system to analyze human and dialogue responses to English Twitter posts.
Our results indicate that 1) responses from both humans and DialoGPT contain more ad hominems for discussions around marginalized communities, 2) different quantities of ad hominems in the training data can influence the likelihood of generating ad hominems, and 3) we can constrained decoding techniques to reduce ad hominems
arXiv Detail & Related papers (2020-10-24T07:37:49Z) - Quantifying the Vulnerabilities of the Online Public Square to Adversarial Manipulation Tactics [43.98568073610101]
We use a social media model to quantify the impacts of several adversarial manipulation tactics on the quality of content.
We find that the presence of influential accounts, a hallmark of social media, exacerbates the vulnerabilities of online communities to manipulation.
These insights suggest countermeasures that platforms could employ to increase the resilience of social media users to manipulation.
arXiv Detail & Related papers (2019-07-13T21:12:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.