An Empirical Study of Dotfiles Repositories Containing User-Specific Configuration Files
- URL: http://arxiv.org/abs/2501.18555v1
- Date: Thu, 30 Jan 2025 18:32:46 GMT
- Title: An Empirical Study of Dotfiles Repositories Containing User-Specific Configuration Files
- Authors: Wenhan Zhu, Michael W. Godfrey,
- Abstract summary: Hundreds of thousands choose to publicly host their repositories on GitHub.<n>We collected and analyzed publicly-hosted dotfiles repositories on GitHub.<n>We found that 25.8% of the top 500 most-starred GitHub users maintain some form of publicly accessible dotfiles repository.
- Score: 1.7556600627464058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Storing user-specific configuration files in a "dotfiles" repository is a common practice among software developers, with hundreds of thousands choosing to publicly host their repositories on GitHub. This practice not only provides developers with a simple backup mechanism for their essential configuration files, but also facilitates sharing ideas and learning from others on how best to configure applications that are key to their daily workflows. However, our current understanding of these repository sharing practices is limited and mostly anecdotal. To address this gap, we conducted a study to delve deeper into this phenomenon. Beginning with collecting and analyzing publicly-hosted dotfiles repositories on GitHub, we discovered that maintaining dotfiles is widespread among developers. Notably, we found that 25.8% of the top 500 most-starred GitHub users maintain some form of publicly accessible dotfiles repository. Among these, configurations for text editors like Vim and shells such as bash and zsh are the most commonly tracked. Our analysis reveals that updating dotfiles is primarily driven by the need to adjust configurations (63.3%) and project meta-management (25.4%). Surprisingly, we found no significant difference in the types of dotfiles observed across code churn history patterns, suggesting that the frequency of dotfile modifications depends more on the developer than the properties of the specific dotfile and its associated application. Finally, we discuss the challenges associated with managing dotfiles, including the necessity for a reliable and effective deployment mechanism, and how the insights gleaned from dotfiles can inform tool designers by offering real-world usage information.
Related papers
- On the Prevalence and Usage of Commit Signing on GitHub: A Longitudinal and Cross-Domain Study [1.834753484317836]
We study the presence of verified commits in GitHub repositories over five years.
Only 10% of all the commits in these 60 repositories are verified.
We propose ways to identify commit ownership based on GitHub's Events API.
arXiv Detail & Related papers (2025-04-27T12:39:50Z) - Repository-level Code Search with Neural Retrieval Methods [25.222964965449286]
We define the task of repository-level code search as retrieving the set of files from the current state of a code repository that are most relevant to addressing a user's question or bug.
The proposed approach combines BM25-based retrieval over commit messages with neural reranking using CodeBERT to identify the most pertinent files.
Experiments on a new dataset created from 7 popular open-source repositories demonstrate substantial improvements of up to 80% in MAP, MRR and P@1 over the BM25 baseline.
arXiv Detail & Related papers (2025-02-10T21:59:01Z) - SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.
SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.
We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Collaborative, Code-Proximal Dynamic Software Visualization within Code
Editors [55.57032418885258]
This paper introduces the design and proof-of-concept implementation for a software visualization approach that can be embedded into code editors.
Our contribution differs from related work in that we use dynamic analysis of a software system's runtime behavior.
Our visualization approach enhances common remote pair programming tools and is collaboratively usable by employing shared code cities.
arXiv Detail & Related papers (2023-08-30T06:35:40Z) - Exploring Security Practices in Infrastructure as Code: An Empirical
Study [54.669404064111795]
Cloud computing has become popular thanks to the widespread use of Infrastructure as Code (IaC) tools.
scripting process does not automatically prevent practitioners from introducing misconfigurations, vulnerabilities, or privacy risks.
Ensuring security relies on practitioners understanding and the adoption of explicit policies, guidelines, or best practices.
arXiv Detail & Related papers (2023-08-07T23:43:32Z) - RepoFusion: Training Code Models to Understand Your Repository [12.621282610983592]
Large Language Models (LLMs) in coding assistants like GitHub Copilot struggle to understand the context present in the repository.
Recent work has shown the promise of using context from the repository during inference.
We propose RepoFusion, a framework to train models to incorporate relevant repository context.
arXiv Detail & Related papers (2023-06-19T15:05:31Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Coeditor: Leveraging Contextual Changes for Multi-round Code Auto-editing [57.776971051512234]
In this work, we explore a multi-round code auto-editing setting, aiming to predict edits to a code region based on recent changes within the same.
Our model, Coeditor, is a fine-tuned language model specifically designed for code editing tasks.
In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models.
arXiv Detail & Related papers (2023-05-29T19:57:36Z) - An Empirical Study on Workflows and Security Policies in Popular GitHub
Repositories [9.048328480295224]
In open-source projects, anyone can contribute, so it is important to have an active continuous integration and continuous delivery (CI/CD) pipeline.
Many of these projects are hosted on GitHub, where maintainers can create automated security policies.
We measure the usage of GitHub and security policies in thousands of popular repositories based on the number of stars.
arXiv Detail & Related papers (2023-05-25T14:52:23Z) - Automatically Categorising GitHub Repositories by Application Domain [14.265666415804025]
GitHub is the largest host of open source software on the Internet.
It is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains.
Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository.
arXiv Detail & Related papers (2022-07-30T16:27:16Z) - DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge
Base Population [95.0099875111663]
DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction.
DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements.
arXiv Detail & Related papers (2022-01-10T13:29:05Z) - Repo2Vec: A Comprehensive Embedding Approach for Determining Repository
Similarity [2.095199622772379]
Repo2Vec is a comprehensive embedding approach to represent a repository as a distributed vector.
We evaluate our method with two real datasets from GitHub for a combined 1013 repositories.
arXiv Detail & Related papers (2021-07-11T18:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.