Test Set Optimization by Machine Learning Algorithms
- URL: http://arxiv.org/abs/2010.15240v1
- Date: Wed, 28 Oct 2020 21:24:06 GMT
- Title: Test Set Optimization by Machine Learning Algorithms
- Authors: Kaiming Fu and Yulu Jin and Zhousheng Chen
- Abstract summary: We propose several machine learning based methods to predict the minimum amount of test data that produces relatively accurate diagnosis.
We develop a prediction model to fit the data and determine when to terminate testing.
Numerical results show that SVM reaches a diagnosis accuracy of 90.4% while deducting the volume of test set by 35.24%.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diagnosis results are highly dependent on the volume of test set. To derive
the most efficient test set, we propose several machine learning based methods
to predict the minimum amount of test data that produces relatively accurate
diagnosis. By collecting outputs from failing circuits, the feature matrix and
label vector are generated, which involves the inference information of the
test termination point. Thus we develop a prediction model to fit the data and
determine when to terminate testing. The considered methods include LASSO and
Support Vector Machine(SVM) where the relationship between goals(label) and
predictors(feature matrix) are considered to be linear in LASSO and nonlinear
in SVM. Numerical results show that SVM reaches a diagnosis accuracy of 90.4%
while deducting the volume of test set by 35.24%.
Related papers
- Evaluation of Artificial Intelligence Methods for Lead Time Prediction in Non-Cycled Areas of Automotive Production [1.3499500088995464]
The present study examines the effectiveness of applying Artificial Intelligence methods in an automotive production environment.
Data structures are analyzed to identify contextual features and then preprocessed using one-hot encoding.
The research demonstrates that AI methods can be effectively applied to highly variable production data, adding business value.
arXiv Detail & Related papers (2025-01-13T13:28:03Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.
We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.
By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Training on the Benchmark Is Not All You Need [52.01920740114261]
We propose a simple and effective data leakage detection method based on the contents of multiple-choice options.
Our method is able to work under black-box conditions without access to model training data or weights.
We evaluate the degree of data leakage of 31 mainstream open-source LLMs on four benchmark datasets.
arXiv Detail & Related papers (2024-09-03T11:09:44Z) - Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - Machine Learning Data Suitability and Performance Testing Using Fault
Injection Testing Framework [0.0]
This paper presents the Fault Injection for Undesirable Learning in input Data (FIUL-Data) testing framework.
Data mutators explore vulnerabilities of ML systems against the effects of different fault injections.
This paper evaluates the framework using data from analytical chemistry, comprising retention time measurements of anti-sense oligonucleotides.
arXiv Detail & Related papers (2023-09-20T12:58:35Z) - Lab-scale Vibration Analysis Dataset and Baseline Methods for Machinery
Fault Diagnosis with Machine Learning [1.8352113484137629]
This paper presents a dataset of vibration signals from a lab-scale machine.
The performance of the algorithms is evaluated using weighted accuracy (WA) since the data is balanced.
The best-performing algorithm is the SVM with a WA of 99.75% on the 5-fold cross-validations.
arXiv Detail & Related papers (2022-12-27T00:23:59Z) - Learning to predict test effectiveness [1.4213973379473652]
This article offers a machine learning model to predict the extent to which the test could cover a class in terms of a new metric called Coverageability.
We offer a mathematical model to evaluate test effectiveness in terms of size and coverage of the test suite generated automatically for each class.
arXiv Detail & Related papers (2022-08-20T07:26:59Z) - Primal Estimated Subgradient Solver for SVM for Imbalanced
Classification [0.0]
We aim to demonstrate that our cost sensitive PEGASOS SVM achieves good performance on imbalanced data sets with a Majority to Minority Ratio ranging from 8.6:1 to 130:1.
We evaluate the performance by examining the learning curves.
We benchmark our PEGASOS Cost-Sensitive SVM's results of Ding's LINEAR SVM DECIDL method.
arXiv Detail & Related papers (2022-06-19T02:33:14Z) - Cognitive Diagnosis with Explicit Student Vector Estimation and
Unsupervised Question Matrix Learning [53.79108239032941]
We propose an explicit student vector estimation (ESVE) method to estimate the student vectors of DINA.
We also propose an unsupervised method called bidirectional calibration algorithm (HBCA) to label the Q-matrix automatically.
The experimental results on two real-world datasets show that ESVE-DINA outperforms the DINA model on accuracy and that the Q-matrix labeled automatically by HBCA can achieve performance comparable to that obtained with the manually labeled Q-matrix.
arXiv Detail & Related papers (2022-03-01T03:53:19Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.