Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times
- URL: http://arxiv.org/abs/2505.09877v1
- Date: Thu, 15 May 2025 00:47:06 GMT
- Title: Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times
- Authors: Kayo Mimizuka, Megan A Brown, Kai-Cheng Yang, Josephine Lukito,
- Abstract summary: The "post-API age" has sparked optimism about increased platform transparency and renewed opportunities for comprehensive research on digital platforms.<n>However, it remains unclear whether platforms provide adequate data access in practice.<n>Our findings reveal significant challenges in accessing social media data.<n>These challenges have exacerbated existing institutional, regional, and financial inequities in data access.
- Score: 5.997153455641738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Over the past decade, data provided by digital platforms has informed substantial research in HCI to understand online human interaction and communication. Following the closure of major social media APIs that previously provided free access to large-scale data (the "post-API age"), emerging data access programs required by the European Union's Digital Services Act (DSA) have sparked optimism about increased platform transparency and renewed opportunities for comprehensive research on digital platforms, leading to the "post-post-API age." However, it remains unclear whether platforms provide adequate data access in practice. To assess how platforms make data available under the DSA, we conducted a comprehensive survey followed by in-depth interviews with 19 researchers to understand their experiences with data access in this new era. Our findings reveal significant challenges in accessing social media data, with researchers facing multiple barriers including complex API application processes, difficulties obtaining credentials, and limited API usability. These challenges have exacerbated existing institutional, regional, and financial inequities in data access. Based on these insights, we provide actionable recommendations for platforms, researchers, and policymakers to foster more equitable and effective data access, while encouraging broader dialogue within the CSCW community around interdisciplinary and multi-stakeholder solutions.
Related papers
- The Great Data Standoff: Researchers vs. Platforms Under the Digital Services Act [9.275892768167122]
We focus on the 2024 Romanian presidential election interference incident.<n>This is the first event of its kind to trigger systemic risk investigations by the European Commission.<n>By analysing this incident, we can comprehend election-related systemic risk to explore practical research tasks.
arXiv Detail & Related papers (2025-05-02T09:00:19Z) - Multi-Platform Aggregated Dataset of Online Communities (MADOC) [64.45797970830233]
MADOC aggregates and standardizes data from Bluesky, Koo, Reddit, and Voat (2012-2024), containing 18.9 million posts, 236 million comments, and 23.1 million unique users.<n>The dataset enables comparative studies of toxic behavior evolution across platforms through standardized interaction records and sentiment analysis.
arXiv Detail & Related papers (2025-01-22T14:02:11Z) - Leveraging GPT for the Generation of Multi-Platform Social Media Datasets for Research [0.0]
Social media datasets are essential for research on disinformation, influence operations, social sensing, hate speech detection, cyberbullying, and other significant topics.
Access to these datasets is often restricted due to costs and platform regulations.
This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms.
arXiv Detail & Related papers (2024-07-11T09:12:39Z) - SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge [63.311045291016555]
Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts.
This paper summarizes the challenging task, data, and research progress.
arXiv Detail & Related papers (2024-05-17T02:36:14Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - The Biased Journey of MSD_AUDIO.ZIP [5.695436409400152]
Access to the Million Song dataset has become restricted to those within certain affiliations that are connected peer-to-peer.
We draw insights from the experiences of 22 individuals who either attempted to access the data or played a role in its creation.
arXiv Detail & Related papers (2023-08-31T01:42:31Z) - Having your Privacy Cake and Eating it Too: Platform-supported Auditing
of Social Media Algorithms for Public Interest [70.02478301291264]
Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse.
Prior studies have used black-box methods to show that these algorithms can lead to biased or discriminatory outcomes.
We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation.
arXiv Detail & Related papers (2022-07-18T17:32:35Z) - Benchmarks for Deep Off-Policy Evaluation [152.28569758144022]
We present a collection of policies that can be used for benchmarking off-policy evaluation.
The goal of our benchmark is to provide a standardized measure of progress that is motivated from a set of principles.
We provide open-source access to our data and code to foster future research in this area.
arXiv Detail & Related papers (2021-03-30T18:09:33Z) - Reliable and Efficient Long-Term Social Media Monitoring [4.389610557232119]
This technical report presents a cloud-based data collection, pre-processing, and archiving infrastructure.
We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.
arXiv Detail & Related papers (2020-05-05T19:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.