Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering
- URL: http://arxiv.org/abs/2503.03609v1
- Date: Wed, 05 Mar 2025 15:47:22 GMT
- Title: Enhancing the Accuracy and Comprehensibility in Architectural Tactics Detection via Small Model-Augmented Prompt Engineering
- Authors: Lingli Cao, He Zhang, Shanshan Li, Danyang Li, Yanjing Yang, Chenxing Zhong, Xin Zhou, Yue Xie,
- Abstract summary: Architectural tactics (ATs) address non-functional requirements of software systems.<n>We propose Prmt4TD, a small model-augmented prompting framework to enhance the accuracy and comprehensibility of ATs detection.<n>Our evaluation results demonstrate that Prmt4TD achieves accuracy (emphF1-score) improvement of 13%-23% on the ATs balanced dataset.
- Score: 12.554418096667856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Architectural tactics (ATs), as the concrete implementation of architectural decisions in code, address non-functional requirements of software systems. Due to the implicit nature of architectural knowledge in code implementation, developers may risk inadvertently altering or removing these tactics during code modifications or optimizations. Such unintended changes can trigger architectural erosion, gradually undermining the system's original design. While many researchers have proposed machine learning-based methods to improve the accuracy of detecting ATs in code, the black-box nature and the required architectural domain knowledge pose significant challenges for developers in verifying the results. Effective verification requires not only accurate detection results but also interpretable explanations that enhance their comprehensibility. However, this is a critical gap in current research. Large language models (LLMs) can generate easily interpretable ATs detection comments if they have domain knowledge. Fine-tuning LLMs to acquire domain knowledge faces challenges such as catastrophic forgetting and hardware constraints. Thus, we propose Prmt4TD, a small model-augmented prompting framework to enhance the accuracy and comprehensibility of ATs detection. Combining fine-tuned small models with In-Context Learning can also reduce fine-tuning costs while equipping the LLM with additional domain knowledge. Prmt4TD can leverage the remarkable processing and reasoning capabilities of LLMs to generate easily interpretable ATs detection results. Our evaluation results demonstrate that Prmt4TD achieves accuracy (\emph{F1-score}) improvement of 13\%-23\% on the ATs balanced dataset and enhances the comprehensibility of the detection results.
Related papers
- Understanding and Mitigating Errors of LLM-Generated RTL Code [7.747889860813149]
Large language model (LLM) based register-transfer-level (RTL) code generation is promising but the overall success rate remains unsatisfactory.<n>We conduct a comprehensive error analysis and manual categorization.<n>Our findings reveal that most errors stem from insufficient RTL programming knowledge, poor understanding of circuit concepts, or misinterpretation of complex multimodal inputs.
arXiv Detail & Related papers (2025-08-07T11:02:32Z) - Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation [53.16213723669751]
Large-scale models (LSMs) can be an effective framework for semantic representation and understanding.<n>However, their direct deployment is often hindered by high computational complexity and resource requirements.<n>This paper proposes a novel knowledge distillation based semantic communication framework.
arXiv Detail & Related papers (2025-08-04T07:47:18Z) - Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z) - Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs [11.724887822269528]
Large language models (LLMs) have achieved unprecedented performance by leveraging vast pretraining corpora.<n>Their performance remains suboptimal in knowledge-intensive domains such as medicine and scientific research.<n>We propose a novel Structural Entropy-guided Knowledge Navigator (SENATOR) framework that addresses the intrinsic knowledge deficiencies of LLMs.
arXiv Detail & Related papers (2025-05-12T02:21:36Z) - Learning atomic forces from uncertainty-calibrated adversarial attacks [0.0]
We propose the Calibrated Adversarial Geometry Optimization (CAGO) algorithm to discover adversarial structures with user-assigned errors.<n>By performing geometry optimization for uncertainty, we reach adversarial structures with the user-assigned target MLIP prediction error.
arXiv Detail & Related papers (2025-02-25T16:03:59Z) - Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z) - Architectural Flaw Detection in Civil Engineering Using GPT-4 [0.8463972278020965]
This paper investigates the potential of the advanced LLM GPT4 Turbo vision model in detecting architectural flaws during the design phase.
The study evaluates the model's performance through metrics such as precision, recall, and F1 score.
The findings highlight how AI can significantly improve design accuracy, reduce costly revisions, and support sustainable practices.
arXiv Detail & Related papers (2024-10-26T01:10:04Z) - Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into pivotal tokens in reasoning or steps to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH with only 4.5K training samples.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models [20.31388126105889]
DesiGNN is a knowledge-centered framework that converts past model design experiences into structured, fine-grained knowledge priors.<n>By constructing a solid meta-knowledge between unseen graph understanding and known effective architecture patterns, DesiGNN can deliver top-5.77% initial model proposals for unseen datasets within seconds.
arXiv Detail & Related papers (2024-08-13T08:22:01Z) - AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses.<n>Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies.<n>We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z) - Towards Explainable Vulnerability Detection with Large Language Models [17.96542494363619]
Software vulnerabilities pose significant risks to the security and integrity of software systems.<n>The advent of large language models (LLMs) has introduced transformative potential due to their advanced generative capabilities.<n>In this paper, we propose LLMVulExp, an automated framework designed to specialize LLMs for the dual tasks of vulnerability detection and explanation.
arXiv Detail & Related papers (2024-06-14T04:01:25Z) - An Empirical Study of Automated Vulnerability Localization with Large Language Models [21.84971967029474]
Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored.
Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models.
We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning.
arXiv Detail & Related papers (2024-03-30T08:42:10Z) - To Err is Machine: Vulnerability Detection Challenges LLM Reasoning [8.602355712876815]
We present a challenging code reasoning task: vulnerability detection.<n>State-of-the-art (SOTA) models reported only 54.5% Balanced Accuracy in our vulnerability detection evaluation.<n>New models, new training methods, or more execution-specific pretraining data may be needed to conquer vulnerability detection.
arXiv Detail & Related papers (2024-03-25T21:47:36Z) - A Closer Look at the Limitations of Instruction Tuning [52.587607091917214]
We show that Instruction Tuning (IT) fails to enhance knowledge or skills in large language models (LLMs)
We also show that popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model.
Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
arXiv Detail & Related papers (2024-02-03T04:45:25Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.