Related papers: Detecting and Characterizing Bots that Commit Code

Detecting and Characterizing Bots that Commit Code

URL: http://arxiv.org/abs/2003.03172v3
Date: Fri, 27 Mar 2020 20:47:56 GMT
Title: Detecting and Characterizing Bots that Commit Code
Authors: Tapajit Dey, Sara Mousavi, Eduardo Ponce, Tanner Fry, Bogdan Vasilescu, Anna Filippova, Audris Mockus
Abstract summary: We propose a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the ommits. We have compiled a shareable dataset containing detailed information about 461 bots we found (all of whom have more than 1000 commits) and 13,762,430 commits they created.
Score: 16.10540443996897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background: Some developer activity traditionally performed manually, such as making code commits, opening, managing, or closing issues is increasingly subject to automation in many OSS projects. Specifically, such activity is often performed by tools that react to events or run at specific times. We refer to such automation tools as bots and, in many software mining scenarios related to developer productivity or code quality it is desirable to identify bots in order to separate their actions from actions of individuals. Aim: Find an automated way of identifying bots and code committed by these bots, and to characterize the types of bots based on their activity patterns. Method and Result: We propose BIMAN, a systematic approach to detect bots using author names, commit messages, files modified by the commit, and projects associated with the ommits. For our test data, the value for AUC-ROC was 0.9. We also characterized these bots based on the time patterns of their code commits and the types of files modified, and found that they primarily work with documentation files and web pages, and these files are most prevalent in HTML and JavaScript ecosystems. We have compiled a shareable dataset containing detailed information about 461 bots we found (all of whom have more than 1000 commits) and 13,762,430 commits they created.

Related papers

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
BotHawk: An Approach for Bots Detection in Open Source Software Projects [4.59229477803039]
This research aims to investigate bots' behavior in open-source software projects and identify bot accounts with maximum possible accuracy. We've identified four types of bot accounts in open-source software projects by analyzing their behavior across 17 features in 5 dimensions. Our team created BotHawk, a highly effective model for detecting bots in open-source software projects.
arXiv Detail & Related papers (2023-07-25T10:15:38Z)
BotArtist: Generic approach for bot detection in Twitter via semi-automatic machine learning pipeline [47.61306219245444]
Twitter has become a target for bots and fake accounts, resulting in the spread of false information and manipulation. This paper introduces a semi-automatic machine learning pipeline (SAMLP) designed to address the challenges correlated with machine learning model development. We develop a comprehensive bot detection model named BotArtist, based on user profile features.
arXiv Detail & Related papers (2023-05-31T09:12:35Z)
BotShape: A Novel Social Bots Detection Approach via Behavioral Patterns [4.386183132284449]
Based on a real-world data set, we construct behavioral sequences from raw event logs. We observe differences between bots and genuine users and similar patterns among bot accounts. We present a novel social bot detection system BotShape, to automatically catch behavioral sequences and characteristics.
arXiv Detail & Related papers (2023-03-17T19:03:06Z)
MulBot: Unsupervised Bot Detection Based on Multivariate Time Series [2.525739800601558]
MulBot is an unsupervised bot detector based on multidimensional temporal features extracted from user timelines. We perform a binary classification task achieving f1-score $= 0.99$, outperforming state-of-the-art methods. We also demonstrate MulBot's strengths in a novel and practically-relevant task: detecting and separating different botnets.
arXiv Detail & Related papers (2022-09-21T13:56:12Z)
BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data. The proposed method is compared with two statistical approaches based on Universal and User-dependent models. Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z)
A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments [70.1864008701113]
Bots are used in Github repositories to automate repetitive activities that are part of the distributed software development process. This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts. We propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns.
arXiv Detail & Related papers (2020-10-07T09:30:52Z)
Detection of Novel Social Bots by Ensembles of Specialized Classifiers [60.63582690037839]
Malicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. We show that different types of bots are characterized by different behavioral features. We propose a new supervised learning method that trains classifiers specialized for each class of bots and combines their decisions through the maximum rule.
arXiv Detail & Related papers (2020-06-11T22:59:59Z)
BeCAPTCHA-Mouse: Synthetic Mouse Trajectories and Improved Bot Detection [78.11535724645702]
We present BeCAPTCHA-Mouse, a bot detector based on a neuromotor model of mouse dynamics. BeCAPTCHA-Mouse is able to detect bot trajectories of high realism with 93% of accuracy in average using only one mouse trajectory.
arXiv Detail & Related papers (2020-05-02T17:40:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.