GEML: A Grammar-based Evolutionary Machine Learning Approach for
Design-Pattern Detection
- URL: http://arxiv.org/abs/2401.07042v1
- Date: Sat, 13 Jan 2024 11:05:24 GMT
- Title: GEML: A Grammar-based Evolutionary Machine Learning Approach for
Design-Pattern Detection
- Authors: Rafael Barbudo and Aurora Ram\'irez and Francisco Servant and Jos\'e
Ra\'ul Romero
- Abstract summary: Design patterns (DPs) are recognised as a good practice in software development.
The lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code.
We propose GEML, a novel detection approach based on evolutionary machine learning using software properties of diverse nature.
- Score: 7.018591019975254
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Design patterns (DPs) are recognised as a good practice in software
development. However, the lack of appropriate documentation often hampers
traceability, and their benefits are blurred among thousands of lines of code.
Automatic methods for DP detection have become relevant but are usually based
on the rigid analysis of either software metrics or specific properties of the
source code. We propose GEML, a novel detection approach based on evolutionary
machine learning using software properties of diverse nature. Firstly, GEML
makes use of an evolutionary algorithm to extract those characteristics that
better describe the DP, formulated in terms of human-readable rules, whose
syntax is conformant with a context-free grammar. Secondly, a rule-based
classifier is built to predict whether new code contains a hidden DP
implementation. GEML has been validated over five DPs taken from a public
repository recurrently adopted by machine learning studies. Then, we increase
this number up to 15 diverse DPs, showing its effectiveness and robustness in
terms of detection capability. An initial parameter study served to tune a
parameter setup whose performance guarantees the general applicability of this
approach without the need to adjust complex parameters to a specific pattern.
Finally, a demonstration tool is also provided.
Related papers
- Sparse Semantic Dimension as a Generalization Certificate for LLMs [53.681678236115836]
We introduce the Sparse Semantic Dimension (SSD), a complexity measure derived from the active feature vocabulary of a Sparse Autoencoder (SAE) trained on the model's layers.<n>We validate this framework on GPT-2 Small and Gemma-2B, demonstrating that our bound provides non-vacuous certificates at realistic sample sizes.
arXiv Detail & Related papers (2026-02-11T21:45:18Z) - SSVD: Structured SVD for Parameter-Efficient Fine-Tuning and Benchmarking under Domain Shift in ASR [65.90944188787786]
Low-rank adaptation (LoRA) is widely used in speech applications, but its state-of-the-art variants, e.g., VeRA, DoRA, PiSSA, and SVFT, are developed mainly for language and vision tasks, with limited validation in speech.<n>This work presents the first comprehensive integration and benchmarking of these PEFT methods within ESPnet.<n>We evaluate all methods on domain-shifted speech recognition tasks, including child speech and dialectal variation, across model scales from 0.1B to 2B.
arXiv Detail & Related papers (2025-09-02T20:51:17Z) - Self-Organizing Visual Prototypes for Non-Parametric Representation Learning [6.096888891865663]
We present Self-Organizing Visual Prototypes (SOP), a new training technique for unsupervised visual feature learning.<n>In this strategy, a prototype is represented by many semantically similar representations, or support embeddings (SEs), each containing a complementary set of features.<n>We evaluate the representations learned using the SOP strategy on a range of benchmarks, including retrieval, linear evaluation, fine-tuning, and object detection.
arXiv Detail & Related papers (2025-05-23T20:12:07Z) - Test-Time Alignment for Large Language Models via Textual Model Predictive Control [63.508812485566374]
Textual Model Predictive Control (TMPC) is a novel predictive planning framework adapted for aligning Large Language Models at inference time.<n>TMPC is evaluated on three tasks with distinct segmentation properties: discourse-level translation, long-form response generation, and program synthesis.<n>Results demonstrate that TMPC consistently improves performance, highlighting the generality.
arXiv Detail & Related papers (2025-02-28T07:24:33Z) - GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts [58.95913531746308]
We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training.
We propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call textitGeneralizeFormer.
arXiv Detail & Related papers (2025-02-15T10:10:49Z) - Verified Foundations for Differential Privacy [9.513258396376003]
We present SampCert, the first comprehensive, mechanized foundation for differential privacy.
It offers a generic notion of DP, a framework for constructing and composing DP mechanisms, and formally verified implementations of Laplace and Gaussian sampling algorithms.
Indeed, SampCert's verified algorithms power the DP offerings of Amazon Web Services (AWS)
arXiv Detail & Related papers (2024-12-02T16:19:47Z) - Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.
CodeIP is a novel multi-bit watermarking technique that embeds additional information to preserve provenance details.
Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z) - Generalizable Embeddings with Cross-batch Metric Learning [10.553094246710865]
We formulate GAP as a convex combination of learnable prototypes.
We show that the prototype learning can be expressed as a iterative process fitting a linear predictor to a batch of samples.
Building on that perspective, we consider two batches of disjoint classes at each iteration and regularize the learning by expressing the samples of a batch with the prototypes that are fitted to the other batch.
arXiv Detail & Related papers (2023-07-14T20:39:07Z) - DPMLBench: Holistic Evaluation of Differentially Private Machine
Learning [8.568872924668662]
Many studies have recently proposed improved algorithms based on DP-SGD to mitigate utility loss.
More importantly, there is a lack of comprehensive research to compare improvements in these DPML algorithms across utility, defensive capabilities, and generalizability.
We fill this gap by performing a holistic measurement of improved DPML algorithms on utility and defense capability against membership inference attacks (MIAs) on image classification tasks.
arXiv Detail & Related papers (2023-05-10T05:08:36Z) - Coded Residual Transform for Generalizable Deep Metric Learning [34.100840501900706]
We introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability.
CRT represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes.
Our experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins.
arXiv Detail & Related papers (2022-10-09T06:17:31Z) - Automatic Clipping: Differentially Private Deep Learning Made Easier and
Stronger [39.93710312222771]
Per-example clipping is a key algorithmic step that enables practical differential private (DP) training for deep learning models.
We propose an easy-to-use replacement, called automatic clipping, that eliminates the need to tune R for any DPs.
arXiv Detail & Related papers (2022-06-14T19:49:44Z) - What You See is What You Get: Distributional Generalization for
Algorithm Design in Deep Learning [12.215964287323876]
We investigate and leverage a connection between Differential Privacy (DP) and the notion of Distributional Generalization (DG)
We introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard gradient descent (SGD)
arXiv Detail & Related papers (2022-04-07T05:41:40Z) - Software Vulnerability Detection via Deep Learning over Disaggregated
Code Graph Representation [57.92972327649165]
This work explores a deep learning approach to automatically learn the insecure patterns from code corpora.
Because code naturally admits graph structures with parsing, we develop a novel graph neural network (GNN) to exploit both the semantic context and structural regularity of a program.
arXiv Detail & Related papers (2021-09-07T21:24:36Z) - A Novel Anomaly Detection Algorithm for Hybrid Production Systems based
on Deep Learning and Timed Automata [73.38551379469533]
DAD:DeepAnomalyDetection is a new approach for automatic model learning and anomaly detection in hybrid production systems.
It combines deep learning and timed automata for creating behavioral model from observations.
The algorithm has been applied to few data sets including two from real systems and has shown promising results.
arXiv Detail & Related papers (2020-10-29T08:27:43Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.