Software Metadata Classification based on Generative Artificial
Intelligence
- URL: http://arxiv.org/abs/2310.13006v1
- Date: Sat, 14 Oct 2023 07:38:16 GMT
- Title: Software Metadata Classification based on Generative Artificial
Intelligence
- Authors: Seetharam Killivalavan, Durairaj Thenmozhi
- Abstract summary: This paper presents a novel approach to enhance the performance of binary code comment quality classification models through the application of Generative Artificial Intelligence (AI)
By leveraging the OpenAI API, a dataset comprising 1239 newly generated code-comment pairs has been labelled as "Useful" or "Not Useful"
The results affirm the effectiveness of this methodology, indicating its applicability in broader contexts within software development and quality assurance domains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel approach to enhance the performance of binary
code comment quality classification models through the application of
Generative Artificial Intelligence (AI). By leveraging the OpenAI API, a
dataset comprising 1239 newly generated code-comment pairs, extracted from
various GitHub repositories and open-source projects, has been labelled as
"Useful" or "Not Useful", and integrated into the existing corpus of 9048 pairs
in the C programming language. Employing a cutting-edge Large Language Model
Architecture, the generated dataset demonstrates notable improvements in model
accuracy. Specifically, when incorporated into the Support Vector Machine (SVM)
model, a 6% increase in precision is observed, rising from 0.79 to 0.85.
Additionally, the Artificial Neural Network (ANN) model exhibits a 1.5%
increase in recall, climbing from 0.731 to 0.746. This paper sheds light on the
potential of Generative AI in augmenting code comment quality classification
models. The results affirm the effectiveness of this methodology, indicating
its applicability in broader contexts within software development and quality
assurance domains. The findings underscore the significance of integrating
generative techniques to advance the accuracy and efficacy of machine learning
models in practical software engineering scenarios.
Related papers
- Rare Class Prediction Model for Smart Industry in Semiconductor Manufacturing [1.3955252961896323]
This study develops a rare class prediction approach for in situ data collected from a smart semiconductor manufacturing process.
The primary objective is to build a model that addresses issues of noise and class imbalance, enhancing class separation.
The model was evaluated using various performance metrics, with ROC curves showing an AUC of 0.95, a precision of 0.66, and a recall of 0.96
arXiv Detail & Related papers (2024-06-06T22:09:43Z) - Detecting AI Generated Text Based on NLP and Machine Learning Approaches [0.0]
Recent advances in natural language processing may enable AI models to generate writing that is identical to human written form in the future.
This might have profound ethical, legal, and social repercussions.
Our approach includes a machine learning methods that can differentiate between electronically produced text and human-written text.
arXiv Detail & Related papers (2024-04-15T16:37:44Z) - AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving [68.73885845181242]
We propose an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios.
We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.
arXiv Detail & Related papers (2024-03-26T04:27:56Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Generative AI for Software Metadata: Overview of the Information
Retrieval in Software Engineering Track at FIRE 2023 [18.616716369775883]
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments.
The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source C based projects.
The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results.
arXiv Detail & Related papers (2023-10-27T14:13:23Z) - Tool-Augmented Reward Modeling [58.381678612409]
We propose a tool-augmented preference modeling approach, named Themis, to address limitations by empowering RMs with access to external environments.
Our study delves into the integration of external tools into RMs, enabling them to interact with diverse external sources.
In human evaluations, RLHF trained with Themis attains an average win rate of 32% when compared to baselines.
arXiv Detail & Related papers (2023-10-02T09:47:40Z) - On the Reliability and Explainability of Language Models for Program
Generation [15.569926313298337]
We study the capabilities and limitations of automated program generation approaches.
We employ advanced explainable AI approaches to highlight the tokens that significantly contribute to the code transformation.
Our analysis reveals that, in various experimental scenarios, language models can recognize code grammar and structural information, but they exhibit limited robustness to changes in input sequences.
arXiv Detail & Related papers (2023-02-19T14:59:52Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and
Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness.
We combine the SE concept of code complexity with the AI technique of curriculum learning.
We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - A survey on Variational Autoencoders from a GreenAI perspective [0.0]
Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks.
This article provides a comparative evaluation of some of the most successful, recent variations of VAEs.
arXiv Detail & Related papers (2021-03-01T15:26:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.