Related papers: I'm Sorry Dave, I'm Afraid I Can't Return That: On YouTube Search API Use in Research

I'm Sorry Dave, I'm Afraid I Can't Return That: On YouTube Search API Use in Research

URL: http://arxiv.org/abs/2506.04422v1
Date: Wed, 04 Jun 2025 20:13:42 GMT
Title: I'm Sorry Dave, I'm Afraid I Can't Return That: On YouTube Search API Use in Research
Authors: Alexandros Efstratiou,
Abstract summary: We analyze the API's behavior by running identical queries across a period of 12 weeks.<n>Our findings suggest that the search endpoint returns highly inconsistent results in ways that are not officially documented.<n>Our results also suggest that the API may prioritize shorter, more popular videos, although the role of channel popularity is not as clear.
Score: 55.2480439325792
License: http://creativecommons.org/licenses/by/4.0/
Abstract: YouTube is among the most widely-used platforms worldwide, and has seen a lot of recent academic attention. Despite its popularity and the number of studies conducted on it, much less is understood about the way in which YouTube's Data API, and especially the Search endpoint, operates. In this paper, we analyze the API's behavior by running identical queries across a period of 12 weeks. Our findings suggest that the search endpoint returns highly inconsistent results between queries in ways that are not officially documented. Specifically, the API seems to randomize returned videos based on the relative popularity of the respective topic during the query period, making it nearly impossible to obtain representative historical video samples, especially during non-peak topical periods. Our results also suggest that the API may prioritize shorter, more popular videos, although the role of channel popularity is not as clear. We conclude with suggested strategies for researchers using the API for data collection, as well as future research directions on expanding the API's use-cases.

Related papers

Forgetful by Design? A Critical Audit of YouTube's Search API for Academic Research [0.0]
This paper critically audits the search endpoint of YouTube's Data API (v3), a common tool for academic research.<n>We identify major limitations regarding completeness, representativeness, consistency, and bias.
arXiv Detail & Related papers (2025-06-13T12:39:59Z)
TikTok's Research API: Problems Without Explanations [2.06242362470764]
TikTok augmented its Research API access within Europe in July 2023.<n>Despite this expansion, notable limitations and inconsistencies persist within the data provided.<n>The API data is incomplete, making it unreliable when working with data donations.
arXiv Detail & Related papers (2025-06-11T13:50:06Z)
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval [61.414236415351446]
We propose MomentSeeker, a novel benchmark for long-video moment retrieval (LMVR)<n>MomentSeeker is created based on long and diverse videos, averaging over 1200 seconds in duration.<n>It covers a variety of real-world scenarios in three levels: global-level, event-level, object-level, covering common tasks like action recognition, object localization, and causal reasoning.
arXiv Detail & Related papers (2025-02-18T05:50:23Z)
APIRL: Deep Reinforcement Learning for REST API Fuzzing [3.053989095162017]
APIRL is a fully automated deep reinforcement learning tool for testing REST APIs.<n>We show APIRL can find significantly more bugs than the state-of-the-art in real world REST APIs.
arXiv Detail & Related papers (2024-12-20T15:40:51Z)
A Systematic Evaluation of Large Code Models in API Suggestion: When, Which, and How [53.65636914757381]
API suggestion is a critical task in modern software development. Recent advancements in large code models (LCMs) have shown promise in the API suggestion task.
arXiv Detail & Related papers (2024-09-20T03:12:35Z)
A Solution-based LLM API-using Methodology for Academic Information Seeking [49.096714812902576]
SoAy is a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as the reasoning method, where a solution is a pre-constructed API calling sequence. Results show a 34.58-75.99% performance improvement compared to state-of-the-art LLM API-based baselines.
arXiv Detail & Related papers (2024-05-24T02:44:14Z)
What we can learn from TikTok through its Research API [3.424635462664968]
The recent release of a free Research API opens the door to collecting data on posted videos, associated comments, and user activities. Our study focuses on evaluating the reliability of the results returned by the Research API, by collecting and analyzing a random sample of TikTok videos posted in a span of 6 years.
arXiv Detail & Related papers (2024-02-21T14:59:49Z)
Advanced White-Box Heuristics for Search-Based Fuzzing of REST APIs [3.3714461095047743]
Currently, EvoMaster is the only existing tool that supports white-box fuzzing of REST APIs. We provide a series of novel white-box fuzzs, including for example how to deal with under-specified constrains in API schemas. Our novel techniques are implemented as an extension to our open-source, search-based fuzzer EvoMaster.
arXiv Detail & Related papers (2023-09-15T12:39:01Z)
APICom: Automatic API Completion via Prompt Learning and Adversarial Training-based Data Augmentation [6.029137544885093]
API recommendation is the process of assisting developers in finding the required API among numerous candidate APIs. Previous studies mainly modeled API recommendation as the recommendation task, and developers may not yet be able to find what they need. Motivated by the neural machine translation research domain, we can model this problem as the generation task. We propose a novel approach APICom based on prompt learning, which can generate API related to the query according to the prompts.
arXiv Detail & Related papers (2023-09-13T15:31:50Z)
Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z)
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos [60.86880787242561]
Video temporal grounding aims to pinpoint a video segment that matches the query description. We propose an end-to-end framework for fast temporal grounding, which is able to model an hours-long video with textbfone-time network execution. Our method significantly outperforms state-of-the-arts, and achieves textbf14.6$times$ / textbf102.8$times$ higher efficiency respectively.
arXiv Detail & Related papers (2023-03-15T03:54:43Z)
Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information. In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks. We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z)
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset. It consists of over 10,000 YouTube videos, covering a wide range of topics. Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.