Identifying Concurrency Bug Reports via Linguistic Patterns
- URL: http://arxiv.org/abs/2601.16338v1
- Date: Thu, 22 Jan 2026 21:54:14 GMT
- Title: Identifying Concurrency Bug Reports via Linguistic Patterns
- Authors: Shuai Shao, Lu Xiao, Tingting Yu,
- Abstract summary: This paper presents a linguistic-pattern-based framework for automatically identifying bug reports.<n>We derive 58 distinct linguistic patterns from 730 manually labeled bug reports.<n>We evaluate four complementary approaches-matching, learning, prompt-based, and fine-tuning-spanning traditional machine learning, large language models, and pre-trained language models.
- Score: 5.794959117360381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing ubiquity of multi-core architectures, concurrent systems have become essential but increasingly prone to complex issues such as data races and deadlocks. While modern issue-tracking systems facilitate the reporting of such problems, labeling concurrency-related bug reports remains a labor-intensive and error-prone task. This paper presents a linguistic-pattern-based framework for automatically identifying concurrency bug reports. We derive 58 distinct linguistic patterns from 730 manually labeled concurrency bug reports, organized across four levels: word-level (keywords), phrase-level (n-grams), sentence-level (semantic), and bug report-level (contextual). To assess their effectiveness, we evaluate four complementary approaches-matching, learning, prompt-based, and fine-tuning-spanning traditional machine learning, large language models (LLMs), and pre-trained language models (PLMs). Our comprehensive evaluation on 12 large-scale open-source projects (10,920 issue reports from GitHub and Jira) demonstrates that fine-tuning PLMs with linguistic-pattern-enriched inputs achieves the best performance, reaching a precision of 91% on GitHub and 93% on Jira, and maintaining strong precision on post cut-off data (91%). The contributions of this work include: (1) a comprehensive taxonomy of linguistic patterns for concurrency bugs, (2) a novel fine-tuning strategy that integrates domain-specific linguistic knowledge into PLMs, and (3) a curated, labeled dataset to support reproducible research. Together, these advances provide a foundation for improving the automation, precision, and interpretability of concurrency bug classification.
Related papers
- VisCoder2: Building Multi-Language Visualization Coding Agents [63.63232038173407]
We introduce three complementary resources for advancing visualization coding agents.<n>VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models.
arXiv Detail & Related papers (2025-10-24T18:03:57Z) - Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z) - Towards Typologically Aware Rescoring to Mitigate Unfaithfulness in Lower-Resource Languages [9.426642998924724]
Multilingual large language models generate non-faithful output in resource-constrained languages.<n>To mitigate unfaithfulness in such settings, we propose using computationally light auxiliary models to rescore the outputs of larger architectures.<n>We show that monolingual 4-layer BERT models pretrained from scratch on less than 700 MB of data without fine-tuning are able to identify faithful summaries with a mean accuracy of 88.33%.
arXiv Detail & Related papers (2025-02-24T21:22:19Z) - English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports [0.0]
This study is the first comprehensive evaluation of machine translation (MT) performance on bug reports.<n>We analyze the capabilities of DeepL, AWS Translate, and large language models such as ChatGPT, Claude, Gemini, LLaMA, and Mistral.<n>We employ a range of MT evaluation metrics-including BLEU, BERTScore, COMET, METEOR, and ROUGE-alongside classification metrics such as accuracy, precision, recall, and F1-score.
arXiv Detail & Related papers (2025-02-20T07:47:03Z) - EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking [58.15568681219339]
We introduce EquiBench, a new benchmark for evaluating large language models (LLMs)<n>This task directly tests a model's ability to reason about program semantics.<n>We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline.
arXiv Detail & Related papers (2025-02-18T02:54:25Z) - Enhancing IR-based Fault Localization using Large Language Models [5.032687557488094]
This paper enhances Fault Localization (IRFL) by categorizing bug reports based on programming entities, stack traces, and natural language text.<n>To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+.<n> Evaluation on 46 projects with 6,340 bug reports yields an MRR of 0.6770 and MAP of 0.5118, surpassing seven state-of-the-art IRFL techniques.
arXiv Detail & Related papers (2024-12-04T22:47:51Z) - DeMuX: Data-efficient Multilingual Learning [57.37123046817781]
DEMUX is a framework that prescribes exact data-points to label from vast amounts of unlabelled multilingual data.
Our end-to-end framework is language-agnostic, accounts for model representations, and supports multilingual target configurations.
arXiv Detail & Related papers (2023-11-10T20:09:08Z) - OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource
Language Pair for Low-Resource Sentence Retrieval [91.76575626229824]
We present OneAligner, an alignment model specially designed for sentence retrieval tasks.
When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result.
We conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size.
arXiv Detail & Related papers (2022-05-17T19:52:42Z) - Detecting Requirements Smells With Deep Learning: Experiences,
Challenges and Future Work [9.44316959798363]
This work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem.
The current findings show that the dataset is unbalanced and which class examples should be added more.
arXiv Detail & Related papers (2021-08-06T12:45:15Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and
Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations.
GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree.
We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z) - MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing
Benchmark [31.91964553419665]
We present a new multilingual dataset, called MTOP, comprising of 100k annotated utterances in 6 languages across 11 domains.
We achieve an average improvement of +6.3 points on Slot F1 for the two existing multilingual datasets, over best results reported in their experiments.
We demonstrate strong zero-shot performance using pre-trained models combined with automatic translation and alignment, and a proposed distant supervision method to reduce the noise in slot label projection.
arXiv Detail & Related papers (2020-08-21T07:02:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.