Related papers: Detecting Multiple Semantic Concerns in Tangled Code Commits

Detecting Multiple Semantic Concerns in Tangled Code Commits

URL: http://arxiv.org/abs/2601.21298v1
Date: Thu, 29 Jan 2026 05:50:16 GMT
Title: Detecting Multiple Semantic Concerns in Tangled Code Commits
Authors: Beomsu Koh, Neil Walkinshaw, Donghwan Shin,
Abstract summary: Developers often bundle multiple concerns into tangled commits, obscuring intent and complicating maintenance.<n>Recent studies have used Conventional Commits Specification (CCS) and Language Models (LMs) to capture commit intent.<n>We present an empirical study using SLMs to detect multiple semantic concerns in tangled commits.
Score: 1.2578844450585998
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Code commits in a version control system (e.g., Git) should be atomic, i.e., focused on a single goal, such as adding a feature or fixing a bug. In practice, however, developers often bundle multiple concerns into tangled commits, obscuring intent and complicating maintenance. Recent studies have used Conventional Commits Specification (CCS) and Language Models (LMs) to capture commit intent, demonstrating that Small Language Models (SLMs) can approach the performance of Large Language Models (LLMs) while maintaining efficiency and privacy. However, they do not address tangled commits involving multiple concerns, leaving the feasibility of using LMs for multi-concern detection unresolved. In this paper, we frame multi-concern detection in tangled commits as a multi-label classification problem and construct a controlled dataset of artificially tangled commits based on real-world data. We then present an empirical study using SLMs to detect multiple semantic concerns in tangled commits, examining the effects of fine-tuning, concern count, commit-message inclusion, and header-preserving truncation under practical token-budget limits. Our results show that a fine-tuned 14B-parameter SLM is competitive with a state-of-the-art LLM for single-concern commits and remains usable for up to three concerns. In particular, including commit messages improves detection accuracy by up to 44% (in terms of Hamming Loss) with negligible latency overhead, establishing them as important semantic cues.

Related papers

IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation [49.796717294455796]
We present IMMACULATE, a practical auditing framework that detects economically motivated deviations.<n>IMMACULATE selectively audits a small fraction of requests using verifiable computation, achieving strong detection guarantees while amortizing cryptographic overhead.
arXiv Detail & Related papers (2026-02-26T07:21:02Z)
Sparse Semantic Dimension as a Generalization Certificate for LLMs [53.681678236115836]
We introduce the Sparse Semantic Dimension (SSD), a complexity measure derived from the active feature vocabulary of a Sparse Autoencoder (SAE) trained on the model's layers.<n>We validate this framework on GPT-2 Small and Gemma-2B, demonstrating that our bound provides non-vacuous certificates at realistic sample sizes.
arXiv Detail & Related papers (2026-02-11T21:45:18Z)
CodeFuse-CommitEval: Towards Benchmarking LLM's Power on Commit Message and Code Change Inconsistency Detection [8.631593963090985]
Version control relies on commit messages to convey the rationale for code changes, but these messages are often low quality and inconsistent with their diffs-known as message-code inconsistency (MCI)<n>We introduce CODEFUSE-COMMITEVAL, the first benchmark designed for MCI detection using large language models (LLMs)<n>We generate seven types of inconsistent messages through rule-guided mutations of originally consistent commits and apply two-fold validation to verify both positive and negative samples.
arXiv Detail & Related papers (2025-11-25T03:33:57Z)
LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline [35.18683484280968]
Large Language Models (LLMs) are well-positioned to break the barriers of existing solutions.<n>LLMs comprehend both textual data and code in patches and commits.<n>Our approach achieves significantly better accuracy than the state-of-the-art solution by more than 38%.
arXiv Detail & Related papers (2025-10-30T02:47:25Z)
Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset [0.0]
We present ReDef, a high-confidence benchmark of function-level modifications curated from 22 large-scale C/C++ projects.<n>Defective cases are anchored by revert commits, while clean cases are validated through post-hoc history checks.<n>This pipeline yields 3,164 defective and 10,268 clean modifications, offering substantially more reliable labels than prior existing resources.
arXiv Detail & Related papers (2025-09-11T07:07:11Z)
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.<n> LCD can distort the global distribution over strings, sampling tokens based only on local information.<n>We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions [60.43398881149664]
We introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LLM Output Signature.<n>It achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency.
arXiv Detail & Related papers (2025-03-18T09:04:37Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
Using Large Language Models for Commit Message Generation: A Preliminary Study [5.5784148764236114]
Large language models (LLMs) can be used to generate commit messages automatically and effectively. In 78% of the 366 samples, the commit messages generated by LLMs were evaluated by humans as the best.
arXiv Detail & Related papers (2024-01-11T14:06:39Z)
Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models [13.605167159285374]
Commit message generation is a challenging task in automated software engineering. tool is a novel paradigm that can introduce the correlation between commits and issues into the training phase of models. The results show that compared with the original models, the performance of tool-enhanced models is significantly improved.
arXiv Detail & Related papers (2023-07-31T20:35:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.