Related papers: Fault Localization in Deep Learning-based Software: A System-level Approach

Fault Localization in Deep Learning-based Software: A System-level Approach

URL: http://arxiv.org/abs/2411.08172v1
Date: Tue, 12 Nov 2024 20:32:36 GMT
Title: Fault Localization in Deep Learning-based Software: A System-level Approach
Authors: Mohammad Mehdi Morovati, Amin Nikanjam, Foutse Khomh,
Abstract summary: We introduce FL4Deep, a system-level fault localization approach considering the entire Deep Learning development pipeline. In an evaluation using 100 faulty DL scripts, FL4Deep outperformed four previous approaches in terms of accuracy for three out of six DL-related faults.
Score: 12.546853096298175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Over the past decade, Deep Learning (DL) has become an integral part of our daily lives. This surge in DL usage has heightened the need for developing reliable DL software systems. Given that fault localization is a critical task in reliability assessment, researchers have proposed several fault localization techniques for DL-based software, primarily focusing on faults within the DL model. While the DL model is central to DL components, there are other elements that significantly impact the performance of DL components. As a result, fault localization methods that concentrate solely on the DL model overlook a large portion of the system. To address this, we introduce FL4Deep, a system-level fault localization approach considering the entire DL development pipeline to effectively localize faults across the DL-based systems. In an evaluation using 100 faulty DL scripts, FL4Deep outperformed four previous approaches in terms of accuracy for three out of six DL-related faults, including issues related to data (84%), mismatched libraries between training and deployment (100%), and loss function (69%). Additionally, FL4Deep demonstrated superior precision and recall in fault localization for five categories of faults including three mentioned fault types in terms of accuracy, plus insufficient training iteration and activation function.

Related papers

Deep-Bench: Deep Learning Benchmark Dataset for Code Generation [2.897621520197328]
DeepBench is a novel benchmark dataset for function-level Deep learning code generation. GPT-4o -- the state-of-the-art LLM -- achieved 31% accuracy on DeepBench, significantly lower than its 60% on DS-1000. DeepBench offers valuable insights into the LLMs' performance and areas for potential improvement in the DL domain.
arXiv Detail & Related papers (2025-02-26T00:43:50Z)
An Anatomy of 488 Faults from Defects4J Based on the Control- and Data-Flow Graph Representations of Programs [49.38684825106323]
Software fault datasets such as Defects4J provide for each individual fault its location and repair, but do not characterize the faults. We propose a new, direct fault classification scheme based on the control- and data-flow graph representations of programs.
arXiv Detail & Related papers (2025-02-04T13:10:28Z)
BDefects4NN: A Backdoor Defect Database for Controlled Localization Studies in Neural Networks [65.666913051617]
We introduce BDefects4NN, the first backdoor defect database for localization studies. BDefects4NN provides labeled backdoor-defected DNNs at the neuron granularity and enables controlled localization studies of defect root causes. We conduct experiments on evaluating six fault localization criteria and two defect repair techniques, which show limited effectiveness for backdoor defects.
arXiv Detail & Related papers (2024-12-01T09:52:48Z)
Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection [8.22737389683156]
Large Language Models (LLMs) offer promising improvements in fault localization by enhancing code comprehension and reasoning. We introduce LLM4FL, a novel LLM-agent-based fault localization approach that integrates SBFL rankings with a divide-and-conquer strategy. Our results demonstrate that LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy and surpasses state-of-the-art supervised techniques such as DeepFL and Grace.
arXiv Detail & Related papers (2024-09-20T16:47:34Z)
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models [95.09157454599605]
Large Language Models (LLMs) are becoming increasingly powerful, but they still exhibit significant but subtle weaknesses. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies. We introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks.
arXiv Detail & Related papers (2024-06-24T15:16:45Z)
Deep Learning Library Testing: Definition, Methods and Challenges [33.62859142913532]
Deep learning (DL) libraries undertake the underlying optimization and computation. DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. This paper provides an overview of the testing research related to various DL libraries.
arXiv Detail & Related papers (2024-04-27T11:42:13Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
DOMINO: Domain-aware Loss for Deep Learning Calibration [49.485186880996125]
This paper proposes a novel domain-aware loss function to calibrate deep learning models. The proposed loss function applies a class-wise penalty based on the similarity between classes within a given target domain.
arXiv Detail & Related papers (2023-02-10T09:47:46Z)
DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs [15.081278640511998]
DeepFD is a learning-based fault diagnosis and localization framework. It maps the fault localization task to a learning problem. It correctly diagnoses 52% faulty DL programs, compared with around half (27%) achieved by the best state-of-the-art works.
arXiv Detail & Related papers (2022-05-04T08:15:56Z)
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study [4.415977307120617]
We conduct a data-driven analysis of challenges -- and resultant bugs -- involved in writing reliable yet performant imperative DL code. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code.
arXiv Detail & Related papers (2022-01-24T21:12:38Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Characterizing Performance Bugs in Deep Learning Systems [7.245989243616551]
We present the first comprehensive study to characterize symptoms, root causes, and exposing stages of performance bugs in deep learning systems. Our findings shed light on the implications on developing high performance DL systems, and detecting and localizing PBs in DL systems. We also build the first benchmark of 56 PBs in DL systems, and assess the capability of existing approaches in tackling them.
arXiv Detail & Related papers (2021-12-03T08:08:52Z)
Deep Learning and Traffic Classification: Lessons learned from a commercial-grade dataset with hundreds of encrypted and zero-day applications [72.02908263225919]
We share our experience on a commercial-grade DL traffic classification engine. We identify known applications from encrypted traffic, as well as unknown zero-day applications. We propose a novel technique, tailored for DL models, that is significantly more accurate and light-weight than the state of the art.
arXiv Detail & Related papers (2021-04-07T15:21:22Z)
An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications [7.58063287182615]
Mobile Deep Learning (DL) apps integrate DL models trained using large-scale data with DL programs. This paper presents the first comprehensive study on the deployment faults of mobile DL apps. We construct a fine-granularity taxonomy consisting of 23 categories regarding to fault symptoms and distill common fix strategies for different fault types.
arXiv Detail & Related papers (2021-01-13T08:19:50Z)
A Survey of Deep Active Learning [54.376820959917005]
Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters. Deep active learning (DAL) has emerged.
arXiv Detail & Related papers (2020-08-30T04:28:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.