Detecting Insincere Questions from Text: A Transfer Learning Approach
- URL: http://arxiv.org/abs/2012.07587v1
- Date: Mon, 7 Dec 2020 15:03:48 GMT
- Title: Detecting Insincere Questions from Text: A Transfer Learning Approach
- Authors: Ashwin Rachha and Gaurav Vanmane
- Abstract summary: The internet today has become an unrivalled source of information where people converse on content based websites such as Quora, Reddit, StackOverflow and Twitter.
A major arising problem with such websites is the proliferation of toxic comments or instances of insincerity wherein the users instead of maintaining a sincere motive indulge in spreading toxic and divisive content.
In this paper we solve the Insincere Questions Classification problem by fine tuning four cutting age models viz BERT, RoBERTa, DistilBERT and ALBERT.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The internet today has become an unrivalled source of information where
people converse on content based websites such as Quora, Reddit, StackOverflow
and Twitter asking doubts and sharing knowledge with the world. A major arising
problem with such websites is the proliferation of toxic comments or instances
of insincerity wherein the users instead of maintaining a sincere motive
indulge in spreading toxic and divisive content. The straightforward course of
action in confronting this situation is detecting such content beforehand and
preventing it from subsisting online. In recent times Transfer Learning in
Natural Language Processing has seen an unprecedented growth. Today with the
existence of transformers and various state of the art innovations, a
tremendous growth has been made in various NLP domains. The introduction of
BERT has caused quite a stir in the NLP community. As mentioned, when
published, BERT dominated performance benchmarks and thereby inspired many
other authors to experiment with it and publish similar models. This led to the
development of a whole BERT-family, each member being specialized on a
different task. In this paper we solve the Insincere Questions Classification
problem by fine tuning four cutting age models viz BERT, RoBERTa, DistilBERT
and ALBERT.
Related papers
- QiBERT -- Classifying Online Conversations Messages with BERT as a Feature [0.0]
This paper aims to use data obtained from online social conversations in Portuguese schools to observe behavioural trends.
This project used the state of the art (SoA) Machine Learning (ML) algorithms and methods, through BERT based models to classify if utterances are in or out of the debate subject.
arXiv Detail & Related papers (2024-09-09T11:38:06Z) - Multi-class Regret Detection in Hindi Devanagari Script [1.249418440326334]
This study focuses on regret and how it is expressed, specifically in Hindi, on various social media platforms.
We present a novel dataset from three different sources, where each sentence has been manually classified into one of three classes "Regret by action", "Regret by inaction", and "No regret"
Our findings indicate that individuals on social media platforms frequently express regret for both past inactions and actions.
arXiv Detail & Related papers (2024-01-29T20:58:43Z) - Understanding writing style in social media with a supervised
contrastively pre-trained transformer [57.48690310135374]
Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation.
We introduce the Style Transformer for Authorship Representations (STAR), trained on a large corpus derived from public sources of 4.5 x 106 authored texts.
Using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy.
arXiv Detail & Related papers (2023-10-17T09:01:17Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances.
We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID.
The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z) - hBert + BiasCorp -- Fighting Racism on the Web [58.768804813646334]
We are releasing BiasCorp, a dataset containing 139,090 comments and news segment from three specific sources - Fox News, BreitbartNews and YouTube.
In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer.
We are also releasing a JavaScript library and a Chrome Extension Application, to help developers make use of our trained model in web applications.
arXiv Detail & Related papers (2021-04-06T02:17:20Z) - A Weakly Supervised Approach for Classifying Stance in Twitter Replies [11.139350549173953]
adversarial reactions are prevalent in online conversations.
Inferring those adverse views (or stance) from the text in replies is difficult.
We propose a weakly-supervised approach to predict the stance in Twitter replies.
arXiv Detail & Related papers (2021-03-12T06:02:45Z) - HASOCOne@FIRE-HASOC2020: Using BERT and Multilingual BERT models for
Hate Speech Detection [9.23545668304066]
We propose an approach to automatically classify hate speech and offensive content.
We have used the datasets obtained from FIRE 2019 and 2020 shared tasks.
We observed that the pre-trained BERT model and the multilingual-BERT model gave the best results.
arXiv Detail & Related papers (2021-01-22T08:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.