Related papers: Can Small GenAI Language Models Rival Large Language Models in Understanding Application Behavior?

Can Small GenAI Language Models Rival Large Language Models in Understanding Application Behavior?

URL: http://arxiv.org/abs/2511.12576v1
Date: Sun, 16 Nov 2025 12:38:28 GMT
Title: Can Small GenAI Language Models Rival Large Language Models in Understanding Application Behavior?
Authors: Mohammad Meymani, Hamed Jelodar, Parisa Hamedi, Roozbeh Razavi-Far, Ali A. Ghorbani,
Abstract summary: We evaluate the capabilities of both small and large GenAI language models in understanding application behavior.<n>While larger models generally achieve higher overall accuracy, our experiments show that small GenAI models maintain competitive precision and recall.<n>Our findings demonstrate that small GenAI models can effectively complement large ones, providing a practical balance between performance and resource efficiency.
Score: 4.719048895553176
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative AI (GenAI) models, particularly large language models (LLMs), have transformed multiple domains, including natural language processing, software analysis, and code understanding. Their ability to analyze and generate code has enabled applications such as source code summarization, behavior analysis, and malware detection. In this study, we systematically evaluate the capabilities of both small and large GenAI language models in understanding application behavior, with a particular focus on malware detection as a representative task. While larger models generally achieve higher overall accuracy, our experiments show that small GenAI models maintain competitive precision and recall, offering substantial advantages in computational efficiency, faster inference, and deployment in resource-constrained environments. We provide a detailed comparison across metrics such as accuracy, precision, recall, and F1-score, highlighting each model's strengths, limitations, and operational feasibility. Our findings demonstrate that small GenAI models can effectively complement large ones, providing a practical balance between performance and resource efficiency in real-world application behavior analysis.

Related papers

Code Vulnerability Detection Across Different Programming Languages with AI Models [0.0]
This paper presents the implementations of transformer-based models like CodeBERT and CodeLlama.<n>It shows how off-the-shelf models can successfully produce predictive capacity in models through dynamic fine-tuning of the models on vulnerable and safe code fragments.<n>Experiments show that a well-trained CodeBERT can be as good as or even better than some existing static analyzers in terms of accuracy greater than 97%.
arXiv Detail & Related papers (2025-08-14T05:41:58Z)
White-Basilisk: A Hybrid Model for Code Vulnerability Detection [45.03594130075282]
We introduce White-Basilisk, a novel approach to vulnerability detection that demonstrates superior performance.<n>White-Basilisk achieves results in vulnerability detection tasks with a parameter count of only 200M.<n>This research establishes new benchmarks in code security and provides empirical evidence that compact, efficiently designed models can outperform larger counterparts in specialized tasks.
arXiv Detail & Related papers (2025-07-11T12:39:25Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning.<n>Our approach only leverages a small number of samples to search for the desired pruning policy.<n>We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z)
Enhancing Traffic Incident Management with Large Language Models: A Hybrid Machine Learning Approach for Severity Classification [3.674863913115431]
This research showcases the innovative integration of Large Language Models into machine learning for traffic incident management. By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of severity classification.
arXiv Detail & Related papers (2024-03-20T12:33:51Z)
AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs [20.772266479533776]
AXOLOTL is a novel post-processing framework that operates agnostically across tasks and models. It identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance.
arXiv Detail & Related papers (2024-03-01T00:02:37Z)
A comprehensible analysis of the efficacy of Ensemble Models for Bug Prediction [0.0]
We present a comparison and analysis of the efficacy of two AI-based approaches, namely single AI models and ensemble AI models, for predicting the probability of a Java class being buggy. Our experimental findings indicate that the ensemble of AI models can outperform the results of applying individual AI models.
arXiv Detail & Related papers (2023-10-18T17:43:54Z)
A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators [2.88634411143577]
Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications.
arXiv Detail & Related papers (2023-10-06T21:55:57Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
An Application of Pseudo-Log-Likelihoods to Natural Language Scoring [5.382454613390483]
A language model with relatively few parameters and training steps can outperform it on a recent large data set. We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks. We argue that robustness of the smaller model ought to be understood in terms of compositionality.
arXiv Detail & Related papers (2022-01-23T22:00:54Z)
Scaling Language Models: Methods, Analysis & Insights from Training Gopher [83.98181046650664]
We present an analysis of Transformer-based language model performance across a wide range of model scales. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language. We discuss the application of language models to AI safety and the mitigation of downstream harms.
arXiv Detail & Related papers (2021-12-08T19:41:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.