Related papers: Reliable and Efficient Long-Term Social Media Monitoring

Reliable and Efficient Long-Term Social Media Monitoring

URL: http://arxiv.org/abs/2005.02442v3
Date: Mon, 16 Nov 2020 18:56:30 GMT
Title: Reliable and Efficient Long-Term Social Media Monitoring
Authors: Jian Cao, Nicholas Adams-Cohen, R. Michael Alvarez
Abstract summary: This technical report presents a cloud-based data collection, pre-processing, and archiving infrastructure. We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.
Score: 4.389610557232119
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Social media data is now widely used by many academic researchers. However, long-term social media data collection projects, which most typically involve collecting data from public-use APIs, often encounter issues when relying on local-area network servers (LANs) to collect high-volume streaming social media data over long periods of time. In this technical report, we present a cloud-based data collection, pre-processing, and archiving infrastructure, and argue that this system mitigates or resolves the problems most typically encountered when running social media data collection projects on LANs at minimal cloud-computing costs. We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.

Related papers

Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times [5.997153455641738]
The "post-API age" has sparked optimism about increased platform transparency and renewed opportunities for comprehensive research on digital platforms.<n>However, it remains unclear whether platforms provide adequate data access in practice.<n>Our findings reveal significant challenges in accessing social media data.<n>These challenges have exacerbated existing institutional, regional, and financial inequities in data access.
arXiv Detail & Related papers (2025-05-15T00:47:06Z)
Transactional Cloud Applications: Status Quo, Challenges, and Opportunities [6.211108626014235]
The migration to the cloud has brought back data management challenges traditionally handled by database management systems. The shift to a distributed computing infrastructure introduced new issues, such as message delivery, task scheduling, containerization, and (auto)scaling. This tutorial aims to highlight recent trends in the area and discusses open research challenges for the data management community.
arXiv Detail & Related papers (2025-04-23T21:35:40Z)
Multi-Platform Aggregated Dataset of Online Communities (MADOC) [64.45797970830233]
MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users. The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis.
arXiv Detail & Related papers (2025-01-22T14:02:11Z)
Labeled Datasets for Research on Information Operations [71.34999856621306]
We present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data) The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.
arXiv Detail & Related papers (2024-11-15T22:15:01Z)
Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server. We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources. We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z)
Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data. We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z)
Urban Crowdsensing using Social Media: An Empirical Study on Transformer and Recurrent Neural Networks [0.7090165638014329]
We utilize publicly available social media datasets and use them as the basis for two urban sensing problems. One main contribution of this work is our collected dataset from Twitter and Flickr. We demonstrate the usefulness of this dataset with two preliminary supervised learning approaches.
arXiv Detail & Related papers (2020-12-05T15:36:50Z)
Post or Tweet: Lessons from a Study of Facebook and Twitter Usage [9.888864336862385]
This workshop paper reports on an ongoing mixed-methods study on the two arguably most popular social network sites, Facebook and Twitter, for the same users. The overarching goal of the study is to shed light into the nuances of social media selection and cross-platform use by combining survey data about participants' motivations with usage data collected via API extraction.
arXiv Detail & Related papers (2020-11-27T15:55:02Z)
AMUSED: An Annotation Framework of Multi-modal Social Media Data [0.0]
The framework is designed to mitigate the issues of collecting and annotating social media data. AMUSED can be applied in multiple application domains, as a use case, we have implemented the framework for collecting COVID-19 misinformation data.
arXiv Detail & Related papers (2020-10-01T15:50:41Z)
Wide-Area Data Analytics [4.080171822768553]
We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. The Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019. This report summarizes the challenges discussed and the conclusions generated at the workshop.
arXiv Detail & Related papers (2020-06-17T22:44:33Z)
Data Mining with Big Data in Intrusion Detection Systems: A Systematic Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)
Xtreaming: an incremental multidimensional projection technique and its application to streaming data [58.92615359254597]
Xtreaming is a novel incremental projection technique that continuously updates the visual representation to reflect new emerging structures or patterns without visiting the multidimensional data more than once. Our tests show that Xtreaming is competitive in terms of global distance preservation if compared to other streaming and incremental techniques.
arXiv Detail & Related papers (2020-03-08T04:53:16Z)
Curating Social Media Data [0.0]
We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
arXiv Detail & Related papers (2020-02-21T10:07:15Z)
I Know Where You Are Coming From: On the Impact of Social Media Sources on AI Model Performance [79.05613148641018]
We will study the performance of different machine learning models when being learned on multi-modal data from different social networks. Our initial experimental results reveal that social network choice impacts the performance.
arXiv Detail & Related papers (2020-02-05T11:10:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.