Curating Social Media Data
- URL: http://arxiv.org/abs/2002.09202v1
- Date: Fri, 21 Feb 2020 10:07:15 GMT
- Title: Curating Social Media Data
- Authors: Kushal Vaghani
- Abstract summary: We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data.
Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools.
The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media platforms have empowered the democratization of the pulse of
people in the modern era. Due to its immense popularity and high usage, data
published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich
ocean of information. Therefore data-driven analytics of social imprints has
become a vital asset for organisations and governments to further improve their
products and services. However, due to the dynamic and noisy nature of social
media data, performing accurate analysis on raw data is a challenging task. A
key requirement is to curate the raw data before fed into analytics pipelines.
This curation process transforms the raw data into contextualized data and
knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable
analysts cleansing and curating social data and preparing it for reliable
analytics. Our pipeline provides an automatic feature extraction from a corpus
of social media data using existing in-house tools. Further, we offer a
dual-correction mechanism using both automated and crowd-sourced approaches.
The implementation of this pipeline also includes a set of tools for
automatically creating micro-tasks to facilitate the contribution of crowd
users in curating the raw data. For the purposes of this research, we use
Twitter as our motivational social media data platform due to its popularity.
Related papers
- Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future [59.78608958395464]
We build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets.
Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects.
We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
arXiv Detail & Related papers (2024-02-28T00:22:42Z) - Countering Misinformation via Emotional Response Generation [15.383062216223971]
proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and democracy.
Previous research has shown how social correction can be an effective way to curb misinformation.
We present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs.
arXiv Detail & Related papers (2023-11-17T15:37:18Z) - Decoding the Silent Majority: Inducing Belief Augmented Social Graph
with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.
Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Dynamic Datasets and Market Environments for Financial Reinforcement
Learning [68.11692837240756]
FinRL-Meta is a library that processes dynamic datasets from real-world markets into gym-style market environments.
We provide examples and reproduce popular research papers as stepping stones for users to design new trading strategies.
We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance.
arXiv Detail & Related papers (2023-04-25T22:17:31Z) - Analyzing social media with crowdsourcing in Crowd4SDG [1.1403672224109254]
This study presents an approach that provides flexible support for analyzing social media, particularly during emergencies.
The focus is on analyzing images and text contained in social media posts and a set of automatic data processing tools for filtering, classification, and geolocation of content.
Such support includes both feedback and suggestions to configure automated tools, and crowdsourcing to gather inputs from citizens.
arXiv Detail & Related papers (2022-08-04T14:42:20Z) - Designing a Social Media Analytics Dashboard for Government Agency
Crisis Communications [0.0]
Government agencies are increasingly turning to social media to use it as a mouthpiece in times of crisis.
Government agencies need tools that support them in analysing social media data for the public good.
This paper presents a design science research approach that guides the development of a social media analytics dashboard for a regional government agency.
arXiv Detail & Related papers (2022-02-11T10:41:01Z) - Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia
for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS"
Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z) - Birdspotter: A Tool for Analyzing and Labeling Twitter Users [12.558187319452657]
Birdspotter is a tool to analyze and label Twitter users.
Birdspotter.ml is an exploratory visualizer for the computed metrics.
We show how to train birdspotter into a fully-fledged bot detector.
arXiv Detail & Related papers (2020-12-04T02:25:07Z) - Knowledge Discovery from Social Media using Big Data provided Sentiment
Analysis (SoMABiT) [2.218042861844671]
This paper presents and discusses the technological and scientific focus of the SoMABiT as a social media analysis platform using big data technology.
The use of MapReduce and developing a distributed algorithm towards an integrated platform that can scale for any data volume and provide a social media-driven knowledge is the main novelty of the proposed concept.
arXiv Detail & Related papers (2020-01-16T18:53:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.