Identifying and Categorizing Offensive Language in Social Media
- URL: http://arxiv.org/abs/2104.04871v1
- Date: Sat, 10 Apr 2021 22:53:43 GMT
- Title: Identifying and Categorizing Offensive Language in Social Media
- Authors: Nikhil Oswal
- Abstract summary: This study provides a description of a classification system built for SemEval 2019 Task 6: OffensEval.
We trained machine learning and deep learning models along with data preprocessing and sampling techniques to come up with the best results.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Offensive language is pervasive in social media. Individuals frequently take
advantage of the perceived anonymity of computer-mediated communication, using
this to engage in behavior that many of them would not consider in real life.
The automatic identification of offensive content online is an important task
that has gained more attention in recent years. This task can be modeled as a
supervised classification problem in which systems are trained using a dataset
containing posts that are annotated with respect to the presence of some
form(s) of abusive or offensive content. The objective of this study is to
provide a description of a classification system built for SemEval-2019 Task 6:
OffensEval. This system classifies a tweet as either offensive or not offensive
(Sub-task A) and further classifies offensive tweets into categories (Sub-tasks
B \& C). We trained machine learning and deep learning models along with data
preprocessing and sampling techniques to come up with the best results. Models
discussed include Naive Bayes, SVM, Logistic Regression, Random Forest and
LSTM.
Related papers
- THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech [2.7061497863588126]
THOS is a dataset of 8.3k tweets manually labeled with fine-grained annotations about the target of the message.
We demonstrate that this dataset makes it feasible to train classifiers, based on Large Language Models, to perform classification at this level of granularity.
arXiv Detail & Related papers (2023-11-11T00:30:31Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - Sexism Identification in Tweets and Gabs using Deep Neural Networks [6.531659195805749]
This paper explores the classification of sexism in text using a variety of deep neural network model architectures.
It performs binary and multiclass sexism classification on the dataset of tweets and gabs from the sEXism Identification in Social neTworks (EXIST) task in IberLEF 2021.
The models are seen to perform comparatively to those from the competition, with the best performances seen using BERT and a multi-filter CNN model.
arXiv Detail & Related papers (2021-11-05T16:57:08Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - DeL-haTE: A Deep Learning Tunable Ensemble for Hate Speech Detection [0.04297070083645048]
Online hate speech on social media has become a fast-growing problem in recent times.
Three key challenges in automated detection and classification of hateful content are the lack of clearly labeled data, evolving vocabulary and lexicon, and the lack of baseline models for fringe outlets such as Gab.
In this work, we propose a novel framework with three major contributions.
arXiv Detail & Related papers (2020-11-03T17:32:50Z) - WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification
on Different Twitter Datasets [0.0]
A key to fight offensive language on social media is the existence of an automatic offensive language detection system.
In this paper, we describe the system submitted by WideBot AI Lab for the shared task which ranked 10th out of 52 participants with Macro-F1 86.9%.
We also introduced a neural network approach that enhanced the predictive ability of our system that includes CNN, highway network, Bi-LSTM, and attention layers.
arXiv Detail & Related papers (2020-09-11T14:10:03Z) - Trawling for Trolling: A Dataset [56.1778095945542]
We present a dataset that models trolling as a subcategory of offensive content.
The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech.
arXiv Detail & Related papers (2020-08-02T17:23:55Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.