4.5 Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Scams, and Malware
- URL: http://arxiv.org/abs/2412.13459v1
- Date: Wed, 18 Dec 2024 03:03:58 GMT
- Title: 4.5 Million (Suspected) Fake Stars in GitHub: A Growing Spiral of Popularity Contests, Scams, and Malware
- Authors: Hao He, Haoqin Yang, Philipp Burckhardt, Alexandros Kapravelos, Bogdan Vasilescu, Christian Kästner,
- Abstract summary: We present a global, longitudinal measurement study of fake stars in GitHub.
We build StarScout, a scalable tool able to detect anomalous starring behaviors.
Our study has implications for platform moderators, open-source practitioners, and supply chain security researchers.
- Score: 58.60545935390151
- License:
- Abstract: GitHub, the de-facto platform for open-source software development, provides a set of social-media-like features to signal high-quality repositories. Among them, the star count is the most widely used popularity signal, but it is also at risk of being artificially inflated (i.e., faked), decreasing its value as a decision-making signal and posing a security risk to all GitHub users. In this paper, we present a systematic, global, and longitudinal measurement study of fake stars in GitHub. To this end, we build StarScout, a scalable tool able to detect anomalous starring behaviors (i.e., low activity and lockstep) across the entire GitHub metadata. Analyzing the data collected using StarScout, we find that: (1) fake-star-related activities have rapidly surged since 2024; (2) the user profile characteristics of fake stargazers are not distinct from average GitHub users, but many of them have highly abnormal activity patterns; (3) the majority of fake stars are used to promote short-lived malware repositories masquerading as pirating software, game cheats, or cryptocurrency bots; (4) some repositories may have acquired fake stars for growth hacking, but fake stars only have a promotion effect in the short term (i.e., less than two months) and become a burden in the long term. Our study has implications for platform moderators, open-source practitioners, and supply chain security researchers.
Related papers
- Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook [101.30779332427217]
We survey deepfake generation and detection techniques, including the most recent developments in the field.
We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content.
We develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content.
arXiv Detail & Related papers (2024-11-29T08:29:25Z) - The Impact of Sanctions on GitHub Developers and Activities [10.456917780704911]
GitHub has fueled the creation of truly global software.
As software becomes more entwined with global politics and social regulations, it becomes similarly subject to government sanctions.
In 2019, GitHub restricted access to certain services for users in specific locations but rolled back these restrictions for some communities.
arXiv Detail & Related papers (2024-04-08T13:11:11Z) - Rewrite the Stars [70.48224347277014]
Recent studies have drawn attention to the untapped potential of the "star operation" in network design.
Our study attempts to reveal the star operation's ability to map inputs into high-dimensional, non-linear feature spaces.
We introduce StarNet, a simple yet powerful prototype, demonstrating impressive performance and low latency.
arXiv Detail & Related papers (2024-03-29T04:10:07Z) - Unveiling A Hidden Risk: Exposing Educational but Malicious Repositories
in GitHub [0.0]
We use ChatGPT to understand and annotate the content published in software repositories.
We carry out a systematic study on a collection of 35.2K GitHub repositories claimed to be created for educational purposes only.
arXiv Detail & Related papers (2024-03-07T11:36:09Z) - My Brother Helps Me: Node Injection Based Adversarial Attack on Social Bot Detection [69.99192868521564]
Social platforms such as Twitter are under siege from a multitude of fraudulent users.
Due to the structure of social networks, the majority of methods are based on the graph neural network(GNN), which is susceptible to attacks.
We propose a node injection-based adversarial attack method designed to deceive bot detection models.
arXiv Detail & Related papers (2023-10-11T03:09:48Z) - How do Software Engineering Researchers Use GitHub? An Empirical Study of Artifacts & Impact [0.2209921757303168]
We ask whether and how authors engage in social coding related to their research.
Ten thousand papers in top SE research venues, hand-annotating their GitHub links, and studying 309 paper-related repositories.
We find a wide distribution in popularity and impact, some strongly correlated with publication venue.
arXiv Detail & Related papers (2023-10-02T18:56:33Z) - A Lot of Talk and a Badge: An Exploratory Analysis of Personal Achievements in GitHub [13.556531448699284]
GitHub introduced a new element through personal achievements, whereby badges are unlocked and displayed on developers' personal profile pages in recognition of their development activities.
We present an exploratory analysis using mixed methods to study the diffusion of personal badges in GitHub.
We find that most of the developers sampled own at least a badge, but we also observe an increasing number of users who choose to keep their profile private and opt out of displaying badges.
arXiv Detail & Related papers (2023-03-26T12:08:50Z) - Uncovering the Dark Side of Telegram: Fakes, Clones, Scams, and
Conspiracy Movements [67.39353554498636]
We perform a large-scale analysis of Telegram by collecting 35,382 different channels and over 130,000,000 messages.
We find some of the infamous activities also present on privacy-preserving services of the Dark Web, such as carding.
We propose a machine learning model that is able to identify fake channels with an accuracy of 86%.
arXiv Detail & Related papers (2021-11-26T14:53:31Z) - Multi-attentional Deepfake Detection [79.80308897734491]
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns.
We propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps.
arXiv Detail & Related papers (2021-03-03T13:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.