"To Target or Not to Target": Identification and Analysis of Abusive
Text Using Ensemble of Classifiers
- URL: http://arxiv.org/abs/2006.03256v1
- Date: Fri, 5 Jun 2020 06:59:22 GMT
- Title: "To Target or Not to Target": Identification and Analysis of Abusive
Text Using Ensemble of Classifiers
- Authors: Gaurav Verma, Niyati Chhaya, Vishwa Vinay
- Abstract summary: We present an ensemble learning method to identify and analyze abusive and hateful content on social media platforms.
Our stacked ensemble comprises of three machine learning models that capture different aspects of language and provide diverse and coherent insights about inappropriate language.
- Score: 18.053219155702465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With rising concern around abusive and hateful behavior on social media
platforms, we present an ensemble learning method to identify and analyze the
linguistic properties of such content. Our stacked ensemble comprises of three
machine learning models that capture different aspects of language and provide
diverse and coherent insights about inappropriate language. The proposed
approach provides comparable results to the existing state-of-the-art on the
Twitter Abusive Behavior dataset (Founta et al. 2018) without using any user or
network-related information; solely relying on textual properties. We believe
that the presented insights and discussion of shortcomings of current
approaches will highlight potential directions for future research.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have [58.23138483086277]
In this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection.
Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain.
Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages.
arXiv Detail & Related papers (2023-05-23T14:04:12Z) - Idioms, Probing and Dangerous Things: Towards Structural Probing for
Idiomaticity in Vector Space [2.5288257442251107]
The goal of this paper is to learn more about how idiomatic information is structurally encoded in embeddings.
We perform a comparative probing study of static (GloVe) and contextual (BERT) embeddings.
Our experiments indicate that both encode some idiomatic information to varying degrees, but yield conflicting evidence as to whether idiomaticity is encoded in the vector norm.
arXiv Detail & Related papers (2023-04-27T17:06:20Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - A Knowledge-Enhanced Adversarial Model for Cross-lingual Structured
Sentiment Analysis [31.05169054736711]
Cross-lingual structured sentiment analysis task aims to transfer the knowledge from source language to target one.
We propose a Knowledge-Enhanced Adversarial Model (textttKEAM) with both implicit distributed and explicit structural knowledge.
We conduct experiments on five datasets and compare textttKEAM with both the supervised and unsupervised methods.
arXiv Detail & Related papers (2022-05-31T03:07:51Z) - Towards Zero-shot Sign Language Recognition [11.952300437658703]
This paper tackles the problem of zero-shot sign language recognition.
The goal is to leverage models learned over the seen sign classes to recognize the instances of unseen sign classes.
arXiv Detail & Related papers (2022-01-15T19:26:36Z) - Combining Textual Features for the Detection of Hateful and Offensive
Language [5.064332352040358]
We present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter.
We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture.
arXiv Detail & Related papers (2021-12-09T09:50:20Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Using Machine Learning and Natural Language Processing Techniques to
Analyze and Support Moderation of Student Book Discussions [0.0]
The IMapBook project aims at improving the literacy and reading comprehension skills of elementary school-aged children by presenting them with interactive e-books and letting them take part in moderated book discussions.
This study aims to develop and illustrate a machine learning-based approach to message classification that could be used to automatically notify the discussion moderator of a possible need for an intervention and also to collect other useful information about the ongoing discussion.
arXiv Detail & Related papers (2020-11-23T20:33:09Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.