Related papers: Birdspotter: A Tool for Analyzing and Labeling Twitter Users

Birdspotter: A Tool for Analyzing and Labeling Twitter Users

URL: http://arxiv.org/abs/2012.02370v2
Date: Tue, 23 Feb 2021 00:24:17 GMT
Title: Birdspotter: A Tool for Analyzing and Labeling Twitter Users
Authors: Rohit Ram, Quyu Kong, Marian-Andrei Rizoiu
Abstract summary: Birdspotter is a tool to analyze and label Twitter users. Birdspotter.ml is an exploratory visualizer for the computed metrics. We show how to train birdspotter into a fully-fledged bot detector.
Score: 12.558187319452657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The impact of online social media on societal events and institutions is profound; and with the rapid increases in user uptake, we are just starting to understand its ramifications. Social scientists and practitioners who model online discourse as a proxy for real-world behavior, often curate large social media datasets. A lack of available tooling aimed at non-data science experts frequently leaves this data (and the insights it holds) underutilized. Here, we propose birdspotter -- a tool to analyze and label Twitter users --, and birdspotter.ml -- an exploratory visualizer for the computed metrics. birdspotter provides an end-to-end analysis pipeline, from the processing of pre-collected Twitter data, to general-purpose labeling of users, and estimating their social influence, within a few lines of code. The package features tutorials and detailed documentation. We also illustrate how to train birdspotter into a fully-fledged bot detector that achieves better than state-of-the-art performances without making any Twitter API online calls, and we showcase its usage in an exploratory analysis of a topical COVID-19 dataset.

Related papers

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
Existing approaches rely on expensive human annotation, making them unsustainable at scale. We propose AgentTrek, a scalable data synthesis pipeline that generates web agent trajectories by leveraging publicly available tutorials. Our fully automated approach significantly reduces data collection costs, achieving a cost of just $0.55 per high-quality trajectory without human annotators.
arXiv Detail & Related papers (2024-12-12T18:59:27Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps) It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data [31.19059013571499]
Twitter-Demographer is a flow-based tool to enrich Twitter data with additional information about tweets and users. We discuss our design choices, inspired by the flow-based programming paradigm, to use black-box components that can easily be chained together and extended.
arXiv Detail & Related papers (2022-01-26T14:59:17Z)
Identification of Twitter Bots based on an Explainable ML Framework: the US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data. Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm. Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z)
Explainable Patterns: Going from Findings to Insights to Support Data Analytics Democratization [60.18814584837969]
We present Explainable Patterns (ExPatt), a new framework to support lay users in exploring and creating data storytellings. ExPatt automatically generates plausible explanations for observed or selected findings using an external (textual) source of information.
arXiv Detail & Related papers (2021-01-19T16:13:44Z)
Sentiment Analysis on Social Media Content [0.0]
The aim of this paper is to present a model that can perform sentiment analysis of real data collected from Twitter. Data in Twitter is highly unstructured which makes it difficult to analyze. Our proposed model is different from prior work in this field because it combined the use of supervised and unsupervised machine learning algorithms.
arXiv Detail & Related papers (2020-07-04T17:03:30Z)
Evently: Modeling and Analyzing Reshare Cascades with Hawkes Processes [12.558187319452657]
Evently is a tool for modeling online reshare cascades and particularly retweet cascades. It provides a comprehensive set of functionalities for processing raw data from Twitter public APIs. We show that, by characterizing users solely based on how their content spreads online, we can disentangle influential users and online bots.
arXiv Detail & Related papers (2020-06-11T03:13:35Z)
Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset. This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z)
Curating Social Media Data [0.0]
We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
arXiv Detail & Related papers (2020-02-21T10:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.