Reliable and Efficient Long-Term Social Media Monitoring
- URL: http://arxiv.org/abs/2005.02442v3
- Date: Mon, 16 Nov 2020 18:56:30 GMT
- Title: Reliable and Efficient Long-Term Social Media Monitoring
- Authors: Jian Cao, Nicholas Adams-Cohen, R. Michael Alvarez
- Abstract summary: This technical report presents a cloud-based data collection, pre-processing, and archiving infrastructure.
We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.
- Score: 4.389610557232119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media data is now widely used by many academic researchers. However,
long-term social media data collection projects, which most typically involve
collecting data from public-use APIs, often encounter issues when relying on
local-area network servers (LANs) to collect high-volume streaming social media
data over long periods of time. In this technical report, we present a
cloud-based data collection, pre-processing, and archiving infrastructure, and
argue that this system mitigates or resolves the problems most typically
encountered when running social media data collection projects on LANs at
minimal cloud-computing costs. We show how this approach works in different
cloud computing architectures, and how to adapt the method to collect streaming
data from other social media platforms.
Related papers
- Labeled Datasets for Research on Information Operations [71.34999856621306]
We present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data)
The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.
arXiv Detail & Related papers (2024-11-15T22:15:01Z) - Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Urban Crowdsensing using Social Media: An Empirical Study on Transformer
and Recurrent Neural Networks [0.7090165638014329]
We utilize publicly available social media datasets and use them as the basis for two urban sensing problems.
One main contribution of this work is our collected dataset from Twitter and Flickr.
We demonstrate the usefulness of this dataset with two preliminary supervised learning approaches.
arXiv Detail & Related papers (2020-12-05T15:36:50Z) - Post or Tweet: Lessons from a Study of Facebook and Twitter Usage [9.888864336862385]
This workshop paper reports on an ongoing mixed-methods study on the two arguably most popular social network sites, Facebook and Twitter, for the same users.
The overarching goal of the study is to shed light into the nuances of social media selection and cross-platform use by combining survey data about participants' motivations with usage data collected via API extraction.
arXiv Detail & Related papers (2020-11-27T15:55:02Z) - AMUSED: An Annotation Framework of Multi-modal Social Media Data [0.0]
The framework is designed to mitigate the issues of collecting and annotating social media data.
AMUSED can be applied in multiple application domains, as a use case, we have implemented the framework for collecting COVID-19 misinformation data.
arXiv Detail & Related papers (2020-10-01T15:50:41Z) - Wide-Area Data Analytics [4.080171822768553]
We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations.
The Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019.
This report summarizes the challenges discussed and the conclusions generated at the workshop.
arXiv Detail & Related papers (2020-06-17T22:44:33Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z) - Xtreaming: an incremental multidimensional projection technique and its
application to streaming data [58.92615359254597]
Xtreaming is a novel incremental projection technique that continuously updates the visual representation to reflect new emerging structures or patterns without visiting the multidimensional data more than once.
Our tests show that Xtreaming is competitive in terms of global distance preservation if compared to other streaming and incremental techniques.
arXiv Detail & Related papers (2020-03-08T04:53:16Z) - Curating Social Media Data [0.0]
We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data.
Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools.
The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data.
arXiv Detail & Related papers (2020-02-21T10:07:15Z) - I Know Where You Are Coming From: On the Impact of Social Media Sources
on AI Model Performance [79.05613148641018]
We will study the performance of different machine learning models when being learned on multi-modal data from different social networks.
Our initial experimental results reveal that social network choice impacts the performance.
arXiv Detail & Related papers (2020-02-05T11:10:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.