Related papers: "To Target or Not to Target": Identification and Analysis of Abusive Text Using Ensemble of Classifiers

"To Target or Not to Target": Identification and Analysis of Abusive Text Using Ensemble of Classifiers

URL: http://arxiv.org/abs/2006.03256v1
Date: Fri, 5 Jun 2020 06:59:22 GMT
Title: "To Target or Not to Target": Identification and Analysis of Abusive Text Using Ensemble of Classifiers
Authors: Gaurav Verma, Niyati Chhaya, Vishwa Vinay
Abstract summary: We present an ensemble learning method to identify and analyze abusive and hateful content on social media platforms. Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language.
Score: 18.053219155702465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With rising concern around abusive and hateful behavior on social media platforms, we present an ensemble learning method to identify and analyze the linguistic properties of such content. Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language. The proposed approach provides comparable results to the existing state-of-the-art on the Twitter Abusive Behavior dataset (Founta et al. 2018) without using any user or network-related information; solely relying on textual properties. We believe that the presented insights and discussion of shortcomings of current approaches will highlight potential directions for future research.

Related papers

Understanding and Analyzing Inappropriately Targeting Language in Online Discourse: A Comparative Annotation Study [1.0923877073891446]
This paper introduces a method for detecting inappropriately targeting language in online conversations by integrating crowd and expert annotations with ChatGPT.<n>We focus on English conversation threads from Reddit, examining comments that target individuals or groups.<n>We perform a comparative analysis of annotations from human experts, crowd annotators, and ChatGPT, revealing strengths and limitations of each method in recognizing both explicit hate speech and subtler discriminatory language.
arXiv Detail & Related papers (2025-05-22T16:10:43Z)
Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights [2.3224139967919974]
We evaluate and explain how adversarial attacks perform in inflectional languages.<n>We use a novel protocol inspired by mechanistic interpretability, based on Edge Attribution Patching (EAP) method.<n>We create a new benchmark based on task-oriented dataset MultiEmo.
arXiv Detail & Related papers (2025-05-08T08:00:03Z)
Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z)
Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time. The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z)
How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have [58.23138483086277]
In this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection. Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain. Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages.
arXiv Detail & Related papers (2023-05-23T14:04:12Z)
Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space [2.5288257442251107]
The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings. We perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings. Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm.
arXiv Detail & Related papers (2023-04-27T17:06:20Z)
Contextual information integration for stance detection via cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target. Most existing stance detection models are limited because they do not consider relevant contextual information. We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z)
A Knowledge-Enhanced Adversarial Model for Cross-lingual Structured Sentiment Analysis [31.05169054736711]
Cross-lingual structured sentiment analysis task aims to transfer the knowledge from source language to target one. We propose a Knowledge-Enhanced Adversarial Model (textttKEAM) with both implicit distributed and explicit structural knowledge. We conduct experiments on five datasets and compare textttKEAM with both the supervised and unsupervised methods.
arXiv Detail & Related papers (2022-05-31T03:07:51Z)
Towards Zero-shot Sign Language Recognition [11.952300437658703]
This paper tackles the problem of zero-shot sign language recognition. The goal is to leverage models learned over the seen sign classes to recognize the instances of unseen sign classes.
arXiv Detail & Related papers (2022-01-15T19:26:36Z)
Combining Textual Features for the Detection of Hateful and Offensive Language [5.064332352040358]
We present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter. We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture.
arXiv Detail & Related papers (2021-12-09T09:50:20Z)
Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels. We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z)
Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions [0.0]
The IMapBook project aims at improving the literacy and reading comprehension skills of elementary school-aged children by presenting them with interactive e-books and letting them take part in moderated book discussions. This study aims to develop and illustrate a machine learning-based approach to message classification that could be used to automatically notify the discussion moderator of a possible need for an intervention and also to collect other useful information about the ongoing discussion.
arXiv Detail & Related papers (2020-11-23T20:33:09Z)
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning. Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.