Improving Deep Learning Library Testing with Machine Learning
- URL: http://arxiv.org/abs/2602.03755v1
- Date: Tue, 03 Feb 2026 17:19:01 GMT
- Title: Improving Deep Learning Library Testing with Machine Learning
- Authors: Facundo Molina, M M Abid Naziri, Feiran Qin, Alessandra Gorla, Marcelo d'Amorim,
- Abstract summary: We explore using machine learning (ML) to determine input validity.<n>Shape relationships are a precise abstraction to encode concrete inputs and capture of the data.<n>We show that ML-enhanced input classification is an important aid to scale DL library testing.
- Score: 40.21709249669499
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep Learning (DL) libraries like TensorFlow and Pytorch simplify machine learning (ML) model development but are prone to bugs due to their complex design. Bug-finding techniques exist, but without precise API specifications, they produce many false alarms. Existing methods to mine API specifications lack accuracy. We explore using ML classifiers to determine input validity. We hypothesize that tensor shapes are a precise abstraction to encode concrete inputs and capture relationships of the data. Shape abstraction severely reduces problem dimensionality, which is important to facilitate ML training. Labeled data are obtained by observing runtime outcomes on a sample of inputs and classifiers are trained on sets of labeled inputs to capture API constraints. Our evaluation, conducted over 183 APIs from TensorFlow and Pytorch, shows that the classifiers generalize well on unseen data with over 91% accuracy. Integrating these classifiers into the pipeline of ACETest, a SoTA bug-finding technique, improves its pass rate from ~29% to ~61%. Our findings suggest that ML-enhanced input classification is an important aid to scale DL library testing.
Related papers
- Testing Deep Learning Libraries via Neurosymbolic Constraint Learning [3.491101173753068]
Deep Learning (DL) libraries (e.g., PyTorch) are popular in AI development.<n>A key challenge in testing DL libraries is the lack of API specifications.<n>We develop Centaur -- the first neurosymbolic technique to test DL library APIs using dynamically learned input constraints.
arXiv Detail & Related papers (2026-01-21T21:54:41Z) - Constraint-Guided Unit Test Generation for Machine Learning Libraries [8.883254370291256]
Machine learning (ML) libraries such as PyTorch and tensors are essential for a wide range of modern applications.<n> Ensuring the correctness of ML libraries through testing is crucial.<n>In this paper, we present PynguinML, an approach that improves the Pynguin test generator to leverage these constraints.
arXiv Detail & Related papers (2025-10-10T08:02:15Z) - LeakageDetector: An Open Source Data Leakage Analysis Tool in Machine Learning Pipelines [3.5453450990441238]
Our work seeks to enable Machine Learning (ML) engineers to write better code by helping them find and fix instances of Data Leakage in their models.<n> ML developers must carefully separate their data into training, evaluation, and test sets to avoid introducing Data Leakage into their code.<n>In this paper, we develop LEAKAGEDETECTOR, a Python plugin that identifies instances of Data Leakage in ML code and provides suggestions on how to remove the leakage.
arXiv Detail & Related papers (2025-03-18T20:53:44Z) - Analysis of Zero Day Attack Detection Using MLP and XAI [0.0]
This paper analyzes Machine Learning (ML) and Deep Learning (DL) based approaches to create Intrusion Detection Systems (IDS)<n>The focus is on using the KDD99 dataset, which has the most research done among all the datasets for detecting zero-day attacks.<n>We evaluate the performance of four multilayer perceptron (MLP) trained on the KDD99 dataset, including baseline ML models, weighted ML models, truncated ML models, and weighted truncated ML models.
arXiv Detail & Related papers (2025-01-28T02:20:34Z) - Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models [49.214291813478695]
Deep learning (DL) libraries, widely used in AI applications, often contain vulnerabilities like overflows and use buffer-free errors.<n>Traditional fuzzing struggles with the complexity and API diversity of DL libraries.<n>We propose DFUZZ, an LLM-driven fuzzing approach for DL libraries.
arXiv Detail & Related papers (2025-01-08T07:07:22Z) - Subgraph-Oriented Testing for Deep Learning Libraries [9.78188667672054]
We propose SORT (Subgraph-Oriented Realistic Testing) to test Deep Learning (DL) libraries on different hardware platforms.<n>SORT takes popular API interaction patterns, represented as frequent subgraphs of model graphs, as test subjects.<n>SORT achieves a 100% valid input generation rate, detects more precision bugs than existing methods, and reveals interaction-related bugs missed by single-API testing.
arXiv Detail & Related papers (2024-12-09T12:10:48Z) - Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries [8.779035160734523]
This paper introduces an LLM-enhanced differential testing technique for DL libraries.<n>It addresses the challenges of finding alternative implementations for a given API and generating diverse test inputs.<n>It synthesizes counterparts for 1.84 times as many APIs as those found by state-of-the-art techniques.
arXiv Detail & Related papers (2024-06-12T07:06:38Z) - XAL: EXplainable Active Learning Makes Classifiers Better Low-resource Learners [71.8257151788923]
We propose a novel Explainable Active Learning framework (XAL) for low-resource text classification.<n>XAL encourages classifiers to justify their inferences and delve into unlabeled data for which they cannot provide reasonable explanations.<n>Experiments on six datasets show that XAL achieves consistent improvement over 9 strong baselines.
arXiv Detail & Related papers (2023-10-09T08:07:04Z) - Interpretability at Scale: Identifying Causal Mechanisms in Alpaca [62.65877150123775]
We use Boundless DAS to efficiently search for interpretable causal structure in large language models while they follow instructions.
Our findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models.
arXiv Detail & Related papers (2023-05-15T17:15:40Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.