Related papers: Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering

Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering

URL: http://arxiv.org/abs/2503.03609v1
Date: Wed, 05 Mar 2025 15:47:22 GMT
Title: Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering
Authors: Lingli Cao, He Zhang, Shanshan Li, Danyang Li, Yanjing Yang, Chenxing Zhong, Xin Zhou, Yue Xie,
Abstract summary: Architectural tactics (ATs) address non-functional requirements of software systems.<n>We propose Prmt4TD, a small model-augmented prompting framework to enhance the accuracy and comprehensibility of ATs detection.<n>Our evaluation results demonstrate that Prmt4TD achieves accuracy (emphF1-score) improvement of 13%-23% on the ATs balanced dataset.
Score: 12.554418096667856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Architectural tactics (ATs), as the concrete implementation of architectural decisions in code, address non-functional requirements of software systems. Due to the implicit nature of architectural knowledge in code implementation, developers may risk inadvertently altering or removing these tactics during code modifications or optimizations. Such unintended changes can trigger architectural erosion, gradually undermining the system's original design. While many researchers have proposed machine learning-based methods to improve the accuracy of detecting ATs in code, the black-box nature and the required architectural domain knowledge pose significant challenges for developers in verifying the results. Effective verification requires not only accurate detection results but also interpretable explanations that enhance their comprehensibility. However, this is a critical gap in current research. Large language models (LLMs) can generate easily interpretable ATs detection comments if they have domain knowledge. Fine-tuning LLMs to acquire domain knowledge faces challenges such as catastrophic forgetting and hardware constraints. Thus, we propose Prmt4TD, a small model-augmented prompting framework to enhance the accuracy and comprehensibility of ATs detection. Combining fine-tuned small models with In-Context Learning can also reduce fine-tuning costs while equipping the LLM with additional domain knowledge. Prmt4TD can leverage the remarkable processing and reasoning capabilities of LLMs to generate easily interpretable ATs detection results. Our evaluation results demonstrate that Prmt4TD achieves accuracy (\emph{F1-score}) improvement of 13\%-23\% on the ATs balanced dataset and enhances the comprehensibility of the detection results.

Related papers

Learning atomic forces from uncertainty-calibrated adversarial attacks [0.0]
We propose the Calibrated Adversarial Geometry Optimization (CAGO) algorithm to discover adversarial structures with user-assigned errors.<n>By performing geometry optimization for uncertainty, we reach adversarial structures with the user-assigned target MLIP prediction error.
arXiv Detail & Related papers (2025-02-25T16:03:59Z)
Architectural Flaw Detection in Civil Engineering Using GPT-4 [0.8463972278020965]
This paper investigates the potential of the advanced LLM GPT4 Turbo vision model in detecting architectural flaws during the design phase. The study evaluates the model's performance through metrics such as precision, recall, and F1 score. The findings highlight how AI can significantly improve design accuracy, reduce costly revisions, and support sustainable practices.
arXiv Detail & Related papers (2024-10-26T01:10:04Z)
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training. We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios. Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into pivotal tokens in reasoning or steps to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH with only 4.5K training samples.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.<n>Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.<n>We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z)
Towards Explainable Vulnerability Detection with Large Language Models [17.96542494363619]
Software vulnerabilities pose significant risks to the security and integrity of software systems.<n>The advent of large language models (LLMs) has introduced transformative potential due to their advanced generative capabilities.<n>In this paper, we propose LLMVulExp, an automated framework designed to specialize LLMs for the dual tasks of vulnerability detection and explanation.
arXiv Detail & Related papers (2024-06-14T04:01:25Z)
An Empirical Study of Automated Vulnerability Localization with Large Language Models [21.84971967029474]
Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored. Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models. We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning.
arXiv Detail & Related papers (2024-03-30T08:42:10Z)
To Err is Machine: Vulnerability Detection Challenges LLM Reasoning [8.602355712876815]
We present a challenging code reasoning task: vulnerability detection.<n>State-of-the-art (SOTA) models reported only 54.5% Balanced Accuracy in our vulnerability detection evaluation.<n>New models, new training methods, or more execution-specific pretraining data may be needed to conquer vulnerability detection.
arXiv Detail & Related papers (2024-03-25T21:47:36Z)
A Closer Look at the Limitations of Instruction Tuning [52.587607091917214]
We show that Instruction Tuning (IT) fails to enhance knowledge or skills in large language models (LLMs) We also show that popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
arXiv Detail & Related papers (2024-02-03T04:45:25Z)
Federated Learning with Unreliable Clients: Performance Analysis and Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients. However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training. We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.