BotHawk: An Approach for Bots Detection in Open Source Software Projects
- URL: http://arxiv.org/abs/2307.13386v1
- Date: Tue, 25 Jul 2023 10:15:38 GMT
- Title: BotHawk: An Approach for Bots Detection in Open Source Software Projects
- Authors: Fenglin Bi, Zhiwei Zhu, Wei Wang, Xiaoya Xia, Hassan Ali Khan, Peng Pu
- Abstract summary: This research aims to investigate bots' behavior in open-source software projects and identify bot accounts with maximum possible accuracy.
We've identified four types of bot accounts in open-source software projects by analyzing their behavior across 17 features in 5 dimensions.
Our team created BotHawk, a highly effective model for detecting bots in open-source software projects.
- Score: 4.59229477803039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social coding platforms have revolutionized collaboration in software
development, leading to using software bots for streamlining operations.
However, The presence of open-source software (OSS) bots gives rise to problems
including impersonation, spamming, bias, and security risks. Identifying bot
accounts and behavior is a challenging task in the OSS project. This research
aims to investigate bots' behavior in open-source software projects and
identify bot accounts with maximum possible accuracy. Our team gathered a
dataset of 19,779 accounts that meet standardized criteria to enable future
research on bots in open-source projects. We follow a rigorous workflow to
ensure that the data we collect is accurate, generalizable, scalable, and
up-to-date. We've identified four types of bot accounts in open-source software
projects by analyzing their behavior across 17 features in 5 dimensions. Our
team created BotHawk, a highly effective model for detecting bots in
open-source software projects. It outperforms other models, achieving an AUC of
0.947 and an F1-score of 0.89. BotHawk can detect a wider variety of bots,
including CI/CD and scanning bots. Furthermore, we find that the number of
followers, number of repositories, and tags contain the most relevant features
to identify the account type.
Related papers
- The Perceptions of Software Engineers Concerning the Utilization of Bots in the OSS Development Process: An Exploratory Survey [1.663160284499972]
Bots provide daily support to professionals by enhancing productivity and facilitating task automation.
Current bots are not sufficiently intelligent and raised new challenges and enhancements to aid bot designers in developing additional functionalities and integrations.
arXiv Detail & Related papers (2024-11-14T14:16:03Z) - Unmasking Social Bots: How Confident Are We? [41.94295877935867]
We propose to address both bot detection and the quantification of uncertainty at the account level.
This dual focus is crucial as it allows us to leverage additional information related to the quantified uncertainty of each prediction.
Specifically, our approach facilitates targeted interventions for bots when predictions are made with high confidence and suggests caution (e.g., gathering more data) when predictions are uncertain.
arXiv Detail & Related papers (2024-07-18T22:33:52Z) - What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection [48.572932773403274]
We investigate the opportunities and risks of large language models in social bot detection.
We propose a mixture-of-heterogeneous-experts framework to divide and conquer diverse user information modalities.
Experiments show that instruction tuning on 1,000 annotated examples produces specialized LLMs that outperform state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-01T06:21:19Z) - BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation.
This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development.
We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z) - MulBot: Unsupervised Bot Detection Based on Multivariate Time Series [2.525739800601558]
MulBot is an unsupervised bot detector based on multidimensional temporal features extracted from user timelines.
We perform a binary classification task achieving f1-score $= 0.99$, outperforming state-of-the-art methods.
We also demonstrate MulBot's strengths in a novel and practically-relevant task: detecting and separating different botnets.
arXiv Detail & Related papers (2022-09-21T13:56:12Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - A ground-truth dataset and classification model for detecting bots in
GitHub issue and PR comments [70.1864008701113]
Bots are used in Github repositories to automate repetitive activities that are part of the distributed software development process.
This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts.
We propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns.
arXiv Detail & Related papers (2020-10-07T09:30:52Z) - Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion.
We show that different types of bots are characterized by different behavioral features.
We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z) - BeCAPTCHA-Mouse: Synthetic Mouse Trajectories and Improved Bot Detection [78.11535724645702]
We present BeCAPTCHA-Mouse, a bot detector based on a neuromotor model of mouse dynamics.
BeCAPTCHA-Mouse is able to detect bot trajectories of high realism with 93% of accuracy in average using only one mouse trajectory.
arXiv Detail & Related papers (2020-05-02T17:40:49Z) - Detecting and Characterizing Bots that Commit Code [16.10540443996897]
We propose a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the ommits.
We have compiled a shareable dataset containing detailed information about 461 bots we found (all of whom have more than 1000 commits) and 13,762,430 commits they created.
arXiv Detail & Related papers (2020-03-02T21:54:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.