Detecting and Characterizing Bots that Commit Code
- URL: http://arxiv.org/abs/2003.03172v3
- Date: Fri, 27 Mar 2020 20:47:56 GMT
- Title: Detecting and Characterizing Bots that Commit Code
- Authors: Tapajit Dey, Sara Mousavi, Eduardo Ponce, Tanner Fry, Bogdan
Vasilescu, Anna Filippova, Audris Mockus
- Abstract summary: We propose a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the ommits.
We have compiled a shareable dataset containing detailed information about 461 bots we found (all of whom have more than 1000 commits) and 13,762,430 commits they created.
- Score: 16.10540443996897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Some developer activity traditionally performed manually, such as
making code commits, opening, managing, or closing issues is increasingly
subject to automation in many OSS projects. Specifically, such activity is
often performed by tools that react to events or run at specific times. We
refer to such automation tools as bots and, in many software mining scenarios
related to developer productivity or code quality it is desirable to identify
bots in order to separate their actions from actions of individuals. Aim: Find
an automated way of identifying bots and code committed by these bots, and to
characterize the types of bots based on their activity patterns. Method and
Result: We propose BIMAN, a systematic approach to detect bots using author
names, commit messages, files modified by the commit, and projects associated
with the ommits. For our test data, the value for AUC-ROC was 0.9. We also
characterized these bots based on the time patterns of their code commits and
the types of files modified, and found that they primarily work with
documentation files and web pages, and these files are most prevalent in HTML
and JavaScript ecosystems. We have compiled a shareable dataset containing
detailed information about 461 bots we found (all of whom have more than 1000
commits) and 13,762,430 commits they created.
Related papers
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - BotHawk: An Approach for Bots Detection in Open Source Software Projects [4.59229477803039]
This research aims to investigate bots' behavior in open-source software projects and identify bot accounts with maximum possible accuracy.
We've identified four types of bot accounts in open-source software projects by analyzing their behavior across 17 features in 5 dimensions.
Our team created BotHawk, a highly effective model for detecting bots in open-source software projects.
arXiv Detail & Related papers (2023-07-25T10:15:38Z) - BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation.
This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development.
We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z) - BotShape: A Novel Social Bots Detection Approach via Behavioral Patterns [4.386183132284449]
Based on a real-world data set, we construct behavioral sequences from raw event logs.
We observe differences between bots and genuine users and similar patterns among bot accounts.
We present a novel social bot detection system BotShape, to automatically catch behavioral sequences and characteristics.
arXiv Detail & Related papers (2023-03-17T19:03:06Z) - MulBot: Unsupervised Bot Detection Based on Multivariate Time Series [2.525739800601558]
MulBot is an unsupervised bot detector based on multidimensional temporal features extracted from user timelines.
We perform a binary classification task achieving f1-score $= 0.99$, outperforming state-of-the-art methods.
We also demonstrate MulBot's strengths in a novel and practically-relevant task: detecting and separating different botnets.
arXiv Detail & Related papers (2022-09-21T13:56:12Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - A ground-truth dataset and classification model for detecting bots in
GitHub issue and PR comments [70.1864008701113]
Bots are used in Github repositories to automate repetitive activities that are part of the distributed software development process.
This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts.
We propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns.
arXiv Detail & Related papers (2020-10-07T09:30:52Z) - Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion.
We show that different types of bots are characterized by different behavioral features.
We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z) - BeCAPTCHA-Mouse: Synthetic Mouse Trajectories and Improved Bot Detection [78.11535724645702]
We present BeCAPTCHA-Mouse, a bot detector based on a neuromotor model of mouse dynamics.
BeCAPTCHA-Mouse is able to detect bot trajectories of high realism with 93% of accuracy in average using only one mouse trajectory.
arXiv Detail & Related papers (2020-05-02T17:40:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.