Towards Automated Identification of Violation Symptoms of Architecture Erosion
- URL: http://arxiv.org/abs/2306.08616v5
- Date: Fri, 22 Aug 2025 18:38:53 GMT
- Title: Towards Automated Identification of Violation Symptoms of Architecture Erosion
- Authors: Ruiyin Li, Peng Liang, Paris Avgeriou,
- Abstract summary: This paper explores the automated identification of violation symptoms from developer discussions in code reviews.<n>We developed 15 machine learning-based classifiers using pre-trained word embeddings and evaluated them on code review comments.<n>Results show that SVM with word2vec achieved the best ML/DL performance with an F1-score of 0.779, while fastText embeddings also yielded strong results.
- Score: 2.915855887948474
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Architecture erosion has a detrimental effect on maintenance and evolution, as the implementation deviates from the intended architecture. Detecting symptoms of erosion, particularly architectural violations, at an early stage is crucial. This paper explores the automated identification of violation symptoms from developer discussions in code reviews. We developed 15 machine learning-based and 4 deep learning-based classifiers using three pre-trained word embeddings, and evaluated them on code review comments from four large open-source projects (OpenStack Nova/Neutron and Qt Base/Creator). To validate practical value, we conducted surveys and semi-structured interviews with developers involved in these discussions. We further compared traditional ML/DL classifiers with Large Language Models (LLMs) such as GPT-4o, Qwen-2.5, and DeepSeek-R1. Results show that SVM with word2vec achieved the best ML/DL performance with an F1-score of 0.779, while fastText embeddings also yielded strong results. Ensemble voting strategies enhanced traditional classifiers, and 200-dimensional embeddings generally outperformed 100/300-dimensional ones. LLM-based classifiers consistently surpassed ML/DL models, with GPT-4o achieving the best F1-score of 0.851, though ensembles added no further benefits. Overall, our study provides an automated approach to identify architecture violation symptoms, offers systematic comparisons of ML/DL and LLM methods, and delivers practitioner insights, contributing to sustainable architectural conformance in software systems.
Related papers
- BRIDGE: Building Representations In Domain Guided Program Verification [67.36686119518441]
BRIDGE decomposes verification into three interconnected domains: Code, Specifications, and Proofs.<n>We show that this approach substantially improves both accuracy and efficiency beyond standard error feedback methods.
arXiv Detail & Related papers (2025-11-26T06:39:19Z) - Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy [0.0]
This paper explores the automatic classification of exam questions and learning outcomes according to Bloom's taxonomy.<n>A small dataset of 600 sentences labeled with six cognitive categories was processed using traditional machine learning (ML) models.
arXiv Detail & Related papers (2025-11-14T02:31:12Z) - Reinforcement Learning with Rubric Anchors [26.9944158097067]
Reinforcement Learning from Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing Large Language Models (LLMs)<n>We extend the RLVR paradigm to open-ended tasks by integrating rubric-based rewards.<n>We construct, to our knowledge, the largest rubric reward system to date, with over 10,000 rubrics from humans, LLMs, or a hybrid human-LLM collaboration.
arXiv Detail & Related papers (2025-08-18T10:06:08Z) - When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs [55.20230501807337]
We present the first systematic evaluation of 5 methods for improving prompt robustness within a unified experimental framework.<n>We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural Instructions dataset.
arXiv Detail & Related papers (2025-08-15T10:32:50Z) - Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z) - Sentinel: SOTA model to protect against prompt injections [0.0]
Large Language Models (LLMs) are increasingly powerful but are vulnerable to prompt injection attacks.<n>This paper introduces Sentinel, a novel detection model, based on the answerdotai/ModernBERT-large architecture.<n>On a comprehensive, unseen internal test set, Sentinel demonstrates an average accuracy of 0.987 and an F1-score of 0.980.
arXiv Detail & Related papers (2025-06-05T14:07:15Z) - VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation [0.8087612190556891]
VADER comprises 174 real-world software vulnerabilities, each carefully curated from GitHub and annotated by security experts.<n>For each vulnerability case, models are tasked with identifying the flaw, classifying it using Common Weaknession (CWE), explaining its underlying cause, proposing a patch, and formulating a test plan.<n>Using a one-shot prompting strategy, we benchmark six state-of-the-art LLMs (Claude 3.7 Sonnet, Gemini 2.5 Pro, GPT-4.1, GPT-4.5, Grok 3 Beta, and o3) on VADER.<n>Our results show that current state-of-the-
arXiv Detail & Related papers (2025-05-26T01:20:44Z) - ASMA-Tune: Unlocking LLMs' Assembly Code Comprehension via Structural-Semantic Instruction Tuning [33.53059396922164]
Assembly code analysis and comprehension play critical roles in applications like reverse engineering.<n>Traditional masked language modeling approaches do not explicitly focus on natural language interaction.<n>We present Assembly Augmented Tuning, an end-to-end structural-semantic instruction tuning framework.
arXiv Detail & Related papers (2025-03-14T17:36:08Z) - From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education [24.970741456147447]
Large Language Models (LLMs) have demonstrated impressive mathematical reasoning capabilities, achieving near-perfect performance on benchmarks like GSM8K.
However, their application in personalized education remains limited due to an overemphasis on correctness over error diagnosis and feedback generation.
We introduce textbfMathCCS, a benchmark designed for systematic error analysis and tailored feedback.
Second, we develop a sequential error analysis framework that leverages historical data to track trends and improve diagnostic precision.
Third, we propose a multi-agent collaborative framework that combines a Time Series Agent for historical analysis and an MLLM Agent for real-
arXiv Detail & Related papers (2025-02-19T14:57:51Z) - Shortcut Learning Susceptibility in Vision Classifiers [3.004632712148892]
Shortcut learning is where machine learning models exploit spurious correlations in data instead of capturing meaningful features.
This phenomenon is prevalent across various machine learning applications, including vision, natural language processing, and speech recognition.
We systematically evaluate these architectures by introducing deliberate shortcuts into the dataset that are positionally correlated with class labels.
arXiv Detail & Related papers (2025-02-13T10:25:52Z) - LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z) - Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset.
We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6.
Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z) - Program Slicing in the Era of Large Language Models [7.990456190723922]
Program slicing is a critical technique in software engineering, enabling developers to isolate relevant portions of code.
This study investigates the application of large language models (LLMs) to both static and dynamic program slicing.
arXiv Detail & Related papers (2024-09-19T00:07:56Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - The All-Seeing Project V2: Towards General Relation Comprehension of the Open World [58.40101895719467]
We present the All-Seeing Project V2, a new model and dataset designed for understanding object relations in images.
We propose the All-Seeing Model V2 that integrates the formulation of text generation, object localization, and relation comprehension into a relation conversation task.
Our model excels not only in perceiving and recognizing all objects within the image but also in grasping the intricate relation graph between them.
arXiv Detail & Related papers (2024-02-29T18:59:17Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Neural Architecture Design and Robustness: A Dataset [11.83842808044211]
We introduce a database on neural architecture design and robustness evaluations.
We evaluate all these networks on a range of common adversarial attacks and corruption types.
We find that carefully crafting the topology of a network can have substantial impact on its robustness.
arXiv Detail & Related papers (2023-06-11T16:02:14Z) - Fine-Grained ImageNet Classification in the Wild [0.0]
Robustness tests can uncover several vulnerabilities and biases which go unnoticed during the typical model evaluation stage.
In our work, we perform fine-grained classification on closely related categories, which are identified with the help of hierarchical knowledge.
arXiv Detail & Related papers (2023-03-04T12:25:07Z) - Learning from Mistakes: Self-Regularizing Hierarchical Representations
in Point Cloud Semantic Segmentation [15.353256018248103]
LiDAR semantic segmentation has gained attention to accomplish fine-grained scene understanding.
We present a coarse-to-fine setup that LEArns from classification mistaKes (LEAK) derived from a standard model.
Our LEAK approach is very general and can be seamlessly applied on top of any segmentation architecture.
arXiv Detail & Related papers (2023-01-26T14:52:30Z) - Warnings: Violation Symptoms Indicating Architecture Erosion [2.6580082406002705]
We investigated the characteristics of architecture violation symptoms in code review comments from the developers' perspective.
Ten categories of violation symptoms are discussed by developers during the code review process.
The most frequently-used linguistic pattern is Problem Discovery.
arXiv Detail & Related papers (2022-12-23T06:29:55Z) - A Context-Sensitive Word Embedding Approach for The Detection of Troll
Tweets [0.0]
We develop and evaluate a set of model architectures for the automatic detection of troll tweets.
BERT, ELMo, and GloVe embedding methods performed better than the GloVe method.
CNN and GRU encoders performed similarly in terms of F1 score and AUC.
The best-performing method was found to be an ELMo-based architecture that employed a GRU classifier, with an AUC score of 0.929.
arXiv Detail & Related papers (2022-07-17T17:12:16Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes [81.85509264573948]
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier.
In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool.
In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
arXiv Detail & Related papers (2020-10-26T17:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.