Related papers: Launch-Day Diffusion: Tracking Hacker News Impact on GitHub Stars for AI Tools

Launch-Day Diffusion: Tracking Hacker News Impact on GitHub Stars for AI Tools

URL: http://arxiv.org/abs/2511.04453v1
Date: Thu, 06 Nov 2025 15:23:50 GMT
Title: Launch-Day Diffusion: Tracking Hacker News Impact on GitHub Stars for AI Tools
Authors: Obada Kraishan,
Abstract summary: Social news platforms have become key launch outlets for open-source projects, especially Hacker News.<n>This paper presents a reproducible demonstration system that tracks how HN exposure translates into GitHub star growth for AI and LLM tools.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Social news platforms have become key launch outlets for open-source projects, especially Hacker News (HN), though quantifying their immediate impact remains challenging. This paper presents a reproducible demonstration system that tracks how HN exposure translates into GitHub star growth for AI and LLM tools. Built entirely on public APIs, our pipeline analyzes 138 repository launches from 2024-2025 and reveals substantial launch effects: repositories gain an average of 121 stars within 24 hours, 189 stars within 48 hours, and 289 stars within a week of HN exposure. Through machine learning models (Elastic Net) and non-linear approaches (Gradient Boosting), we identify key predictors of viral growth. Posting timing appears as key factor--launching at optimal hours can mean hundreds of additional stars--while the "Show HN" tag shows no statistical advantage after controlling for other factors. The demonstration completes in under five minutes on standard hardware, automatically collecting data, training models, and generating visualizations through single-file scripts. This makes our findings immediately reproducible and the framework easily be extended to other platforms, providing both researchers and developers with actionable insights into launch dynamics.

Related papers

Toward Training Superintelligent Software Agents through Self-Play SWE-RL [66.11447353341926]
Self-play SWE-RL is a first step toward training paradigms for superintelligent software agents.<n>Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies.<n>Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories.
arXiv Detail & Related papers (2025-12-21T00:49:40Z)
CWM: An Open-Weights LLM for Research on Code Generation with World Models [78.0342683953353]
We release Code World Model (CWM) to advance research on code generation with world models.<n>We mid-train CWM on a large amount of observation-action trajectories from Python interpreter and agentic Docker environments.<n>We present first steps of how world models can benefit agentic coding, enable step-by-step simulation of Python code execution, and show early results of how reasoning can benefit the latter.
arXiv Detail & Related papers (2025-09-30T21:47:10Z)
Beneath the Mask: Can Contribution Data Unveil Malicious Personas in Open-Source Projects? [0.0]
This paper demonstrates how Open Source Intelligence (OSINT) data gathered from GitHub contributions, analyzed using graph databases and graph theory, can efficiently identify anomalous behaviors exhibited by the "JiaT75" personas across other open-source projects.
arXiv Detail & Related papers (2025-08-19T02:17:48Z)
VulGuard: An Unified Tool for Evaluating Just-In-Time Vulnerability Prediction Models [3.4299920908334673]
VulGuard is an automated tool designed to streamline the extraction, processing, and analysis of commits from GitHub repositories for vulnerability prediction (JIT-VP) research.<n>It automatically mines commit histories, extracts fine-grained code changes, commit messages, and software engineering metrics, and formats them for downstream analysis.
arXiv Detail & Related papers (2025-07-22T15:18:44Z)
Mercury: Ultra-Fast Language Models Based on Diffusion [58.52391675075641]
We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion.<n>Mercury Coder comes in two sizes: Mini and Small.<n>Based on independent evaluations, Mercury Coder Mini and Mercury Coder Small achieve state-of-the-art throughputs of 1109 tokens/sec and 737 tokens/sec, respectively.
arXiv Detail & Related papers (2025-06-17T17:06:18Z)
The TESS Ten Thousand Catalog: 10,001 uniformly-vetted and -validated Eclipsing Binary Stars detected in Full-Frame Image data by machine learning and analyzed by citizen scientists [0.5381115559554392]
We present a catalog of 10001 uniformly-vetted and -validated eclipsing binary stars that passed all our ephemeris and photocenter tests.<n>We outline the detection and analysis of the targets, discuss the properties of the sample, and highlight potentially interesting systems.
arXiv Detail & Related papers (2025-06-05T23:29:13Z)
Six Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Spams, and Malware [52.84746696418136]
We present a systematic, global, and longitudinal measurement study of fake stars in GitHub.<n>We build StarScout, a scalable tool able to detect anomalous starring behaviors across all GitHub metadata between 2019 and 2024.<n>Analyzing the data collected using StarScout, we find that: (1) fake-star-related activities have rapidly surged in 2024; (2) the accounts and repositories in fake star campaigns have highly trivial activity patterns; and the majority of fake stars are used to promote short-lived phishing malware repositories.
arXiv Detail & Related papers (2024-12-18T03:03:58Z)
Generative AI for Software Metadata: Overview of the Information Retrieval in Software Engineering Track at FIRE 2023 [18.616716369775883]
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments. The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source C based projects. The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results.
arXiv Detail & Related papers (2023-10-27T14:13:23Z)
Follow Anything: Open-set detection, tracking, and following in real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z)
SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.