Related papers: Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data

Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data

URL: http://arxiv.org/abs/2201.10986v1
Date: Wed, 26 Jan 2022 14:59:17 GMT
Title: Twitter-Demographer: A Flow-based Tool to Enrich Twitter Data
Authors: Federico Bianchi, Vincenzo Cutrona, Dirk Hovy
Abstract summary: Twitter-Demographer is a flow-based tool to enrich Twitter data with additional information about tweets and users. We discuss our design choices, inspired by the flow-based programming paradigm, to use black-box components that can easily be chained together and extended.
Score: 31.19059013571499
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Twitter data have become essential to Natural Language Processing (NLP) and social science research, driving various scientific discoveries in recent years. However, the textual data alone are often not enough to conduct studies: especially social scientists need more variables to perform their analysis and control for various factors. How we augment this information, such as users' location, age, or tweet sentiment, has ramifications for anonymity and reproducibility, and requires dedicated effort. This paper describes Twitter-Demographer, a simple, flow-based tool to enrich Twitter data with additional information about tweets and users. Twitter-Demographer is aimed at NLP practitioners and (computational) social scientists who want to enrich their datasets with aggregated information, facilitating reproducibility, and providing algorithmic privacy-by-design measures for pseudo-anonymity. We discuss our design choices, inspired by the flow-based programming paradigm, to use black-box components that can easily be chained together and extended. We also analyze the ethical issues related to the use of this tool, and the built-in measures to facilitate pseudo-anonymity.

Related papers

Do We Trust What They Say or What They Do? A Multimodal User Embedding Provides Personalized Explanations [35.77028281332307]
We propose Contribution-Aware Multimodal User Embedding (CAMUE) for social networks. We show that our approach can provide personalized explainable predictions, automatically mitigating the impact of unreliable information. Our work paves the way for more explainable, reliable, and effective social media user embedding.
arXiv Detail & Related papers (2024-09-04T02:17:32Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
Use of social media and Natural Language Processing (NLP) in natural hazard research [0.0]
In the works of Sasaki et al. (2010) and Earle et al. (2011) the authors explored the real-time interaction on Twitter for detecting natural hazards. An inherent challenge for such an application is the natural language processing (NLP), which basically consists in converting the words in number. In this report we implement a NLP machine learning process with advanced classification and classification applications.
arXiv Detail & Related papers (2023-04-17T15:03:05Z)
Unsupervised Neural Stylistic Text Generation using Transfer learning and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation. We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z)
Identification of Twitter Bots based on an Explainable ML Framework: the US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data. Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm. Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z)
The emojification of sentiment on social media: Collection and analysis of a longitudinal Twitter sentiment dataset [5.528896840956628]
TM-Senti is a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets. We describe and assess our methodology to put together a large-scale, emoticon- and emoji-based labelled sentiment analysis dataset. Our analysis highlights interesting temporal changes, among others in the increasing use of emojis over emoticons.
arXiv Detail & Related papers (2021-08-31T14:54:46Z)
Birdspotter: A Tool for Analyzing and Labeling Twitter Users [12.558187319452657]
Birdspotter is a tool to analyze and label Twitter users. Birdspotter.ml is an exploratory visualizer for the computed metrics. We show how to train birdspotter into a fully-fledged bot detector.
arXiv Detail & Related papers (2020-12-04T02:25:07Z)
TIMME: Twitter Ideology-detection via Multi-task Multi-relational Embedding [26.074367752142198]
We aim at solving the problem of predicting people's ideology, or political tendency. We estimate it by using Twitter data, and formalize it as a classification problem.
arXiv Detail & Related papers (2020-06-02T00:00:39Z)
Privacy-Aware Recommender Systems Challenge on Twitter's Home Timeline [47.434392695347924]
RecSys 2020 Challenge organized by ACM RecSys in partnership with Twitter using this dataset. This paper touches on the key challenges faced by researchers and professionals striving to predict user engagements.
arXiv Detail & Related papers (2020-04-28T23:54:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.