A Few Topical Tweets are Enough for Effective User-Level Stance
Detection
- URL: http://arxiv.org/abs/2004.03485v1
- Date: Tue, 7 Apr 2020 15:35:55 GMT
- Title: A Few Topical Tweets are Enough for Effective User-Level Stance
Detection
- Authors: Younes Samih and Kareem Darwish
- Abstract summary: We tackle stance detection for vocal Twitter users using two approaches.
In the first approach, we improve user-level stance detection by representing tweets using contextualized embeddings.
In the second approach, we expand the tweets of a given user using their Twitter timeline tweets, and then we perform unsupervised classification of the user.
- Score: 8.118808561953514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stance detection entails ascertaining the position of a user towards a
target, such as an entity, topic, or claim. Recent work that employs
unsupervised classification has shown that performing stance detection on vocal
Twitter users, who have many tweets on a target, can yield very high accuracy
(+98%). However, such methods perform poorly or fail completely for less vocal
users, who may have authored only a few tweets about a target. In this paper,
we tackle stance detection for such users using two approaches. In the first
approach, we improve user-level stance detection by representing tweets using
contextualized embeddings, which capture latent meanings of words in context.
We show that this approach outperforms two strong baselines and achieves 89.6%
accuracy and 91.3% macro F-measure on eight controversial topics. In the second
approach, we expand the tweets of a given user using their Twitter timeline
tweets, and then we perform unsupervised classification of the user, which
entails clustering a user with other users in the training set. This approach
achieves 95.6% accuracy and 93.1% macro F-measure.
Related papers
- Real-Time Summarization of Twitter [9.034423337410274]
We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant to given interest profiles.
We employ Dirichlet score with and with very little smoothing (baseline) to classify whether a tweet is relevant to a given interest profile.
It is also desired to remove the redundant tweets from the pushing queue.
arXiv Detail & Related papers (2024-07-11T01:56:31Z) - Machine and Deep Learning Applications to Mouse Dynamics for Continuous
User Authentication [0.0]
This article builds upon our previous published work by evaluating our dataset of 40 users using three machine learning and deep learning algorithms.
The top performer is a 1-dimensional convolutional neural network with a peak average test accuracy of 85.73% across the top 10 users.
Multi class classification is also examined using an artificial neural network which reaches an astounding peak accuracy of 92.48%.
arXiv Detail & Related papers (2022-05-26T21:43:59Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Semi-supervised Stance Detection of Tweets Via Distant Network
Supervision [32.86421107987556]
Homophily properties over the social network provide strong signal of coarse-grained user-level stance.
We present SANDS, a new semi-supervised stance detector.
Sands achieves a macro-F1 score of 0.55 (0.49) on US (India)-based datasets.
arXiv Detail & Related papers (2022-01-03T13:04:54Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - Automatic Expansion and Retargeting of Arabic Offensive Language
Training [12.111859709582617]
We employ two key insights, namely that replies on Twitter often imply opposition and some accounts are persistent in their offensiveness towards specific targets.
We show the efficacy of the approach on Arabic tweets with 13% and 79% relative F1-measure improvement in entity specific offensive language detection.
arXiv Detail & Related papers (2021-11-18T08:25:09Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Writer Identification Using Microblogging Texts for Social Media
Forensics [53.180678723280145]
We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes.
We test varying sized author sets and varying amounts of training/test texts per author.
arXiv Detail & Related papers (2020-07-31T00:23:18Z) - Forensic Authorship Analysis of Microblogging Texts Using N-Grams and
Stylometric Features [63.48764893706088]
This work aims at identifying authors of tweet messages, which are limited to 280 characters.
We use for our experiments a self-captured database of 40 users, with 120 to 200 tweets per user.
Results using this small set are promising, with the different features providing a classification accuracy between 92% and 98.5%.
arXiv Detail & Related papers (2020-03-24T19:32:11Z) - Investigating Classification Techniques with Feature Selection For
Intention Mining From Twitter Feed [0.0]
Micro-blogging service Twitter has more than 200 million registered users who exchange more than 65 million posts per day.
Most of the tweets are written informally and often in slang language.
This paper investigates the problem of selecting features that affect extracting user's intention from Twitter feeds.
arXiv Detail & Related papers (2020-01-22T11:55:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.