Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data
- URL: http://arxiv.org/abs/2201.10986v1
- Date: Wed, 26 Jan 2022 14:59:17 GMT
- Title: Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data
- Authors: Federico Bianchi, Vincenzo Cutrona, Dirk Hovy
- Abstract summary: Twitter-Demographer is a flow-based tool to enrich Twitter data with additional information about tweets and users.
We discuss our design choices, inspired by the flow-based programming paradigm, to use black-box components that can easily be chained together and extended.
- Score: 31.19059013571499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Twitter data have become essential to Natural Language Processing (NLP) and
social science research, driving various scientific discoveries in recent
years. However, the textual data alone are often not enough to conduct studies:
especially social scientists need more variables to perform their analysis and
control for various factors. How we augment this information, such as users'
location, age, or tweet sentiment, has ramifications for anonymity and
reproducibility, and requires dedicated effort. This paper describes
Twitter-Demographer, a simple, flow-based tool to enrich Twitter data with
additional information about tweets and users. Twitter-Demographer is aimed at
NLP practitioners and (computational) social scientists who want to enrich
their datasets with aggregated information, facilitating reproducibility, and
providing algorithmic privacy-by-design measures for pseudo-anonymity. We
discuss our design choices, inspired by the flow-based programming paradigm, to
use black-box components that can easily be chained together and extended. We
also analyze the ethical issues related to the use of this tool, and the
built-in measures to facilitate pseudo-anonymity.
Related papers
- Do We Trust What They Say or What They Do? A Multimodal User Embedding Provides Personalized Explanations [35.77028281332307]
We propose Contribution-Aware Multimodal User Embedding (CAMUE) for social networks.
We show that our approach can provide personalized explainable predictions, automatically mitigating the impact of unreliable information.
Our work paves the way for more explainable, reliable, and effective social media user embedding.
arXiv Detail & Related papers (2024-09-04T02:17:32Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - Use of social media and Natural Language Processing (NLP) in natural
hazard research [0.0]
In the works of Sasaki et al. (2010) and Earle et al. (2011) the authors explored the real-time interaction on Twitter for detecting natural hazards.
An inherent challenge for such an application is the natural language processing (NLP), which basically consists in converting the words in number.
In this report we implement a NLP machine learning process with advanced classification and classification applications.
arXiv Detail & Related papers (2023-04-17T15:03:05Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - The emojification of sentiment on social media: Collection and analysis
of a longitudinal Twitter sentiment dataset [5.528896840956628]
TM-Senti is a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets.
We describe and assess our methodology to put together a large-scale, emoticon- and emoji-based labelled sentiment analysis dataset.
Our analysis highlights interesting temporal changes, among others in the increasing use of emojis over emoticons.
arXiv Detail & Related papers (2021-08-31T14:54:46Z) - Birdspotter: A Tool for Analyzing and Labeling Twitter Users [12.558187319452657]
Birdspotter is a tool to analyze and label Twitter users.
Birdspotter.ml is an exploratory visualizer for the computed metrics.
We show how to train birdspotter into a fully-fledged bot detector.
arXiv Detail & Related papers (2020-12-04T02:25:07Z) - TIMME: Twitter Ideology-detection via Multi-task Multi-relational
Embedding [26.074367752142198]
We aim at solving the problem of predicting people's ideology, or political tendency.
We estimate it by using Twitter data, and formalize it as a classification problem.
arXiv Detail & Related papers (2020-06-02T00:00:39Z) - Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset.
This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.