Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage
- URL: http://arxiv.org/abs/2212.14727v1
- Date: Tue, 27 Dec 2022 16:08:49 GMT
- Title: Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage
- Authors: \'Alvaro Huertas-Garc\'ia and Alejandro Mart\'in and Javier Huertas
Tato and David Camacho
- Abstract summary: Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
- Score: 64.78260098263489
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Content moderation is the process of screening and monitoring user-generated
content online. It plays a crucial role in stopping content resulting from
unacceptable behaviors such as hate speech, harassment, violence against
specific groups, terrorism, racism, xenophobia, homophobia, or misogyny, to
mention some few, in Online Social Platforms. These platforms make use of a
plethora of tools to detect and manage malicious information; however,
malicious actors also improve their skills, developing strategies to surpass
these barriers and continuing to spread misleading information. Twisting and
camouflaging keywords are among the most used techniques to evade platform
content moderation systems. In response to this recent ongoing issue, this
paper presents an innovative approach to address this linguistic trend in
social networks through the simulation of different content evasion techniques
and a multilingual Transformer model for content evasion detection. In this
way, we share with the rest of the scientific community a multilingual public
tool, named "pyleetspeak" to generate/simulate in a customizable way the
phenomenon of content evasion through automatic word camouflage and a
multilingual Named-Entity Recognition (NER) Transformer-based model tuned for
its recognition and detection. The multilingual NER model is evaluated in
different textual scenarios, detecting different types and mixtures of
camouflage techniques, achieving an overall weighted F1 score of 0.8795. This
article contributes significantly to countering malicious information by
developing multilingual tools to simulate and detect new methods of evasion of
content on social networks, making the fight against information disorders more
effective.
Related papers
- Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual
Predatory Chats and Abusive Texts [2.406214748890827]
This paper proposes an approach to detection of online sexual predatory chats and abusive language using the open-source pretrained Llama 2 7B- parameter model.
We fine-tune the LLM using datasets with different sizes, imbalance degrees, and languages (i.e., English, Roman Urdu and Urdu)
Experimental results show a strong performance of the proposed approach, which performs proficiently and consistently across three distinct datasets.
arXiv Detail & Related papers (2023-08-28T16:18:50Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - A Study of Slang Representation Methods [3.511369967593153]
We study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding.
Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements.
arXiv Detail & Related papers (2022-12-11T21:56:44Z) - Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms.
We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks.
Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Role of Artificial Intelligence in Detection of Hateful Speech for
Hinglish Data on Social Media [1.8899300124593648]
Prevalence of Hindi-English code-mixed data (Hinglish) is on the rise with most of the urban population all over the world.
Hate speech detection algorithms deployed by most social networking platforms are unable to filter out offensive and abusive content posted in these code-mixed languages.
We propose a methodology for efficient detection of unstructured code-mix Hinglish language.
arXiv Detail & Related papers (2021-05-11T10:02:28Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.