Related papers: Curating Social Media Data

Curating Social Media Data

URL: http://arxiv.org/abs/2002.09202v1
Date: Fri, 21 Feb 2020 10:07:15 GMT
Title: Curating Social Media Data
Authors: Kushal Vaghani
Abstract summary: We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of information. Therefore data-driven analytics of social imprints has become a vital asset for organisations and governments to further improve their products and services. However, due to the dynamic and noisy nature of social media data, performing accurate analysis on raw data is a challenging task. A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. Further, we offer a dual-correction mechanism using both automated and crowd-sourced approaches. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data. For the purposes of this research, we use Twitter as our motivational social media data platform due to its popularity.

Related papers

SoMe: A Realistic Benchmark for LLM-based Social Media Agents [64.05026384906915]
SoMe is a benchmark designed to evaluate social media agents equipped with various agent tools for accessing and analyzing social media data.<n>SoMe comprises a diverse collection of 8 social media agent tasks, 9,164,284 posts, 6,591 user profiles, and 25,686 reports from various social media platforms and external websites.<n>By extensive quantitative and qualitative analysis, we provide the first overview into the performance of mainstream agentic LLMs in realistic social media environments.
arXiv Detail & Related papers (2025-12-09T08:36:09Z)
Decentralized Social Media and Artificial Intelligence in Digital Public Health Monitoring [0.6235924228436546]
We argue that digital public health surveillance must adapt by embracing new platforms and methodologies.<n>We discuss the rise of decentralized social networks like Mastodon and Bluesky as alternative data sources.
arXiv Detail & Related papers (2025-12-03T19:54:59Z)
What's the next frontier for Data-centric AI? Data Savvy Agents [71.76058707995398]
We argue that data-savvy capabilities should be a top priority in the design of agentic systems.<n>We propose four key capabilities to realize this vision: Proactive data acquisition, Sophisticated data processing, Interactive test data synthesis, and Continual adaptation.
arXiv Detail & Related papers (2025-11-02T17:09:29Z)
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future [59.78608958395464]
We build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets. Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects. We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
arXiv Detail & Related papers (2024-02-28T00:22:42Z)
Countering Misinformation via Emotional Response Generation [15.383062216223971]
proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and democracy. Previous research has shown how social correction can be an effective way to curb misinformation. We present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs.
arXiv Detail & Related papers (2023-11-17T15:37:18Z)
Decoding the Silent Majority: Inducing Belief Augmented Social Graph with Large Language Model for Response Forecasting [74.68371461260946]
SocialSense is a framework that induces a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics. Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings.
arXiv Detail & Related papers (2023-10-20T06:17:02Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information. To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles. Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z)
Dynamic Datasets and Market Environments for Financial Reinforcement Learning [68.11692837240756]
FinRL-Meta is a library that processes dynamic datasets from real-world markets into gym-style market environments. We provide examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance.
arXiv Detail & Related papers (2023-04-25T22:17:31Z)
Analyzing social media with crowdsourcing in Crowd4SDG [1.1403672224109254]
This study presents an approach that provides flexible support for analyzing social media, particularly during emergencies. The focus is on analyzing images and text contained in social media posts and a set of automatic data processing tools for filtering, classification, and geolocation of content. Such support includes both feedback and suggestions to configure automated tools, and crowdsourcing to gather inputs from citizens.
arXiv Detail & Related papers (2022-08-04T14:42:20Z)
Designing a Social Media Analytics Dashboard for Government Agency Crisis Communications [0.0]
Government agencies are increasingly turning to social media to use it as a mouthpiece in times of crisis. Government agencies need tools that support them in analysing social media data for the public good. This paper presents a design science research approach that guides the development of a social media analytics dashboard for a regional government agency.
arXiv Detail & Related papers (2022-02-11T10:41:01Z)
Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS" Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z)
Birdspotter: A Tool for Analyzing and Labeling Twitter Users [12.558187319452657]
Birdspotter is a tool to analyze and label Twitter users. Birdspotter.ml is an exploratory visualizer for the computed metrics. We show how to train birdspotter into a fully-fledged bot detector.
arXiv Detail & Related papers (2020-12-04T02:25:07Z)
Knowledge Discovery from Social Media using Big Data provided Sentiment Analysis (SoMABiT) [2.218042861844671]
This paper presents and discusses the technological and scientific focus of the SoMABiT as a social media analysis platform using big data technology. The use of MapReduce and developing a distributed algorithm towards an integrated platform that can scale for any data volume and provide a social media-driven knowledge is the main novelty of the proposed concept.
arXiv Detail & Related papers (2020-01-16T18:53:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.