Related papers: TikTok's Research API: Problems Without Explanations

TikTok's Research API: Problems Without Explanations

URL: http://arxiv.org/abs/2506.09746v2
Date: Thu, 12 Jun 2025 11:44:47 GMT
Title: TikTok's Research API: Problems Without Explanations
Authors: Carlos Entrena-Serrano, Martin Degeling, Salvatore Romano, Raziye Buse Çetin,
Abstract summary: TikTok augmented its Research API access within Europe in July 2023.<n>Despite this expansion, notable limitations and inconsistencies persist within the data provided.<n>The API data is incomplete, making it unreliable when working with data donations.
Score: 2.06242362470764
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Following the Digital Services Act of 2023, which requires Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) to facilitate data accessibility for independent research, TikTok augmented its Research API access within Europe in July 2023. This action was intended to ensure compliance with the DSA, bolster transparency, and address systemic risks. Nonetheless, research findings reveal that despite this expansion, notable limitations and inconsistencies persist within the data provided. Our experiment reveals that the API fails to provide metadata for one in eight videos provided through data donations, including official TikTok videos, advertisements, and content from specific accounts, without an apparent reason. The API data is incomplete, making it unreliable when working with data donations, a prominent methodology for algorithm audits and research on platform accountability. To monitor the functionality of the API and eventual fixes implemented by TikTok, we publish a dashboard with a daily check of the availability of 10 videos that were not retrievable in the last month. The video list includes very well-known accounts, notably that of Taylor Swift. The current API lacks the necessary capabilities for thorough independent research and scrutiny. It is crucial to support and safeguard researchers who utilize data scraping to independently validate the platform's data quality.

Related papers

Differentially Private Synthetic Data Release for Topics API Outputs [63.79476766779742]
We focus on one Privacy-Preserving Ads API: the Topics API, part of Google Chrome's Privacy Sandbox.<n>We generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data.<n>We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset.
arXiv Detail & Related papers (2025-06-30T13:46:57Z)
I'm Sorry Dave, I'm Afraid I Can't Return That: On YouTube Search API Use in Research [55.2480439325792]
We analyze the API's behavior by running identical queries across a period of 12 weeks.<n>Our findings suggest that the search endpoint returns highly inconsistent results in ways that are not officially documented.<n>Our results also suggest that the API may prioritize shorter, more popular videos, although the role of channel popularity is not as clear.
arXiv Detail & Related papers (2025-06-04T20:13:42Z)
Post-Post-API Age: Studying Digital Platforms in Scant Data Access Times [5.997153455641738]
The "post-API age" has sparked optimism about increased platform transparency and renewed opportunities for comprehensive research on digital platforms.<n>However, it remains unclear whether platforms provide adequate data access in practice.<n>Our findings reveal significant challenges in accessing social media data.<n>These challenges have exacerbated existing institutional, regional, and financial inequities in data access.
arXiv Detail & Related papers (2025-05-15T00:47:06Z)
The Great Data Standoff: Researchers vs. Platforms Under the Digital Services Act [9.275892768167122]
We focus on the 2024 Romanian presidential election interference incident.<n>This is the first event of its kind to trigger systemic risk investigations by the European Commission.<n>By analysing this incident, we can comprehend election-related systemic risk to explore practical research tasks.
arXiv Detail & Related papers (2025-05-02T09:00:19Z)
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.<n>Traditional fuzzing struggles with the complexity and API diversity of DL libraries.<n>We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z)
What we can learn from TikTok through its Research API [3.424635462664968]
The recent release of a free Research API opens the door to collecting data on posted videos, associated comments, and user activities. Our study focuses on evaluating the reliability of the results returned by the Research API, by collecting and analyzing a random sample of TikTok videos posted in a span of 6 years.
arXiv Detail & Related papers (2024-02-21T14:59:49Z)
Analyzing User Engagement with TikTok's Short Format Video Recommendations using Data Donations [31.764672446151412]
We analyze user engagement on TikTok using data we collect via a data donation system. We find that the average daily usage time increases over the users' lifetime while the user attention remains stable at around 45%. We also find that users like more videos uploaded by people they follow than those recommended by people they do not follow.
arXiv Detail & Related papers (2023-01-12T11:34:45Z)
Black-box Dataset Ownership Verification via Backdoor Watermarking [67.69308278379957]
We formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model. We propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification.
arXiv Detail & Related papers (2022-08-04T05:32:20Z)
DataLab: A Platform for Data Analysis and Intervention [96.75253335629534]
DataLab is a unified data-oriented platform that allows users to interactively analyze the characteristics of data. toolname has features for dataset recommendation and global vision analysis. So far, DataLab covers 1,715 datasets and 3,583 of its transformed version.
arXiv Detail & Related papers (2022-02-25T18:32:19Z)
Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being. A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations. Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented. Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z)
An Empirical Investigation of Personalization Factors on TikTok [77.34726150561087]
Despite the importance of TikTok's algorithm to the platform's success and content distribution, little work has been done on the empirical analysis of the algorithm. Using a sock-puppet audit methodology with a custom algorithm developed by us, we tested and analysed the effect of the language and location used to access TikTok. We identify that the follow-feature has the strongest influence, followed by the like-feature and video view rate.
arXiv Detail & Related papers (2022-01-28T17:40:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.