Fast and Accurate Heuristics for Bus-Factor Estimation
- URL: http://arxiv.org/abs/2508.09828v1
- Date: Wed, 13 Aug 2025 14:03:46 GMT
- Title: Fast and Accurate Heuristics for Bus-Factor Estimation
- Authors: Sebastiano Antonio Piccolo,
- Abstract summary: Bus-factor is a critical risk indicator that quantifies how many key contributors a project can afford to lose before core knowledge or functionality is compromised.<n>Despite its practical importance, accurately computing the bus-factor is NP-Hard under established formalizations, making scalable analysis infeasible for large software systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The bus-factor is a critical risk indicator that quantifies how many key contributors a project can afford to lose before core knowledge or functionality is compromised. Despite its practical importance, accurately computing the bus-factor is NP-Hard under established formalizations, making scalable analysis infeasible for large software systems. In this paper, we model software projects as bipartite graphs of developers and tasks and propose two novel approximation heuristics, Minimum Coverage and Maximum Coverage, based on iterative graph peeling, for two influential bus-factor formalizations. Our methods significantly outperform the widely adopted degree-based heuristic, which we show can yield severely inflated estimates. We conduct a comprehensive empirical evaluation on over $1\,000$ synthetic power-law graphs and demonstrate that our heuristics provide tighter estimates while scaling to graphs with millions of nodes and edges in minutes. Our results reveal that the proposed heuristics are not only more accurate but also robust to structural variations in developer-task assignment graph. We release our implementation as open-source software to support future research and practical adoption.
Related papers
- A Preliminary Study on the Promises and Challenges of Native Top-$k$ Sparse Attention [33.03212783462742]
This report conducts a preliminary investigation into the effectiveness and theoretical mechanisms of the Top-$k$ Attention mechanism.<n>Experiments demonstrate that exact Top-$k$ Decoding achieves performance comparable to, or even surpassing, full attention on downstream tasks.<n>Considering the high computational complexity of exact Top-$k$ Attention, we investigate the impact of approximate Top-$k$ algorithm precision on downstream tasks.
arXiv Detail & Related papers (2025-12-03T06:44:02Z) - Efficient Thought Space Exploration through Strategic Intervention [54.35208611253168]
We propose a novel Hint-Practice Reasoning (HPR) framework that operationalizes this insight through two synergistic components.<n>The framework's core innovation lies in Distributional Inconsistency Reduction (DIR), which dynamically identifies intervention points.<n> Experiments across arithmetic and commonsense reasoning benchmarks demonstrate HPR's state-of-the-art efficiency-accuracy tradeoffs.
arXiv Detail & Related papers (2025-11-13T07:26:01Z) - KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework for Large Language Models (LLMs)<n> Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model.<n>Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z) - FT2Ra: A Fine-Tuning-Inspired Approach to Retrieval-Augmented Code Completion [24.964973946366335]
We develop a novel retrieval-based method, FT2Ra, which aims to mimic genuine fine-tuning.
FT2Ra achieves a 4.29% improvement in accuracy compared to the best baseline method on UniXcoder.
arXiv Detail & Related papers (2024-04-02T01:42:15Z) - Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing [24.685942503019948]
We introduce a novel approach that harnesses the power of a large language model (LLM) to provide a confidence score on the generated answer.
We experiment with our approach on two graph processing tasks: few-shot knowledge graph completion and graph classification.
Our confidence measure achieves an AUC of 0.8 or higher on seven out of the ten datasets in predicting the correctness of the answer generated by LLM.
arXiv Detail & Related papers (2024-03-31T07:38:39Z) - TransPath: Learning Heuristics For Grid-Based Pathfinding via
Transformers [64.88759709443819]
We suggest learning the instance-dependent proxies that are supposed to notably increase the efficiency of the search.
The first proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one.
The second proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path.
arXiv Detail & Related papers (2022-12-22T14:26:11Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z) - EQ-Net: A Unified Deep Learning Framework for Log-Likelihood Ratio
Estimation and Quantization [25.484585922608193]
We introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method.
We carry out extensive experimental evaluation and demonstrate that our single architecture achieves state-of-the-art results on both tasks.
arXiv Detail & Related papers (2020-12-23T18:11:30Z) - Optimistic Agent: Accurate Graph-Based Value Estimation for More
Successful Visual Navigation [18.519303422753534]
We argue that this ability is largely due to three main reasons: the incorporation of prior knowledge (or experience), the adaptation of it to the new environment using the observed visual cues and optimistically searching without giving up early.
This is currently missing in the state-of-the-art visual navigation methods based on Reinforcement Learning (RL)
In this paper, we propose to use externally learned prior knowledge of the relative object locations and integrate it into our model by constructing a neural graph.
arXiv Detail & Related papers (2020-04-07T09:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.