Challenges in Migrating Imperative Deep Learning Programs to Graph
Execution: An Empirical Study
- URL: http://arxiv.org/abs/2201.09953v1
- Date: Mon, 24 Jan 2022 21:12:38 GMT
- Title: Challenges in Migrating Imperative Deep Learning Programs to Graph
Execution: An Empirical Study
- Authors: Tatiana Castro V\'elez, Raffi Khatchadourian, Mehdi Bagherzadeh, Anita
Raja
- Abstract summary: We conduct a data-driven analysis of challenges -- and resultant bugs -- involved in writing reliable yet performant imperative DL code.
We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code.
- Score: 4.415977307120617
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Efficiency is essential to support responsiveness w.r.t. ever-growing
datasets, especially for Deep Learning (DL) systems. DL frameworks have
traditionally embraced deferred execution-style DL code that supports symbolic,
graph-based Deep Neural Network (DNN) computation. While scalable, such
development tends to produce DL code that is error-prone, non-intuitive, and
difficult to debug. Consequently, more natural, less error-prone imperative DL
frameworks encouraging eager execution have emerged but at the expense of
run-time performance. While hybrid approaches aim for the "best of both
worlds," the challenges in applying them in the real world are largely unknown.
We conduct a data-driven analysis of challenges -- and resultant bugs --
involved in writing reliable yet performant imperative DL code by studying 250
open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually
examined code patches and bug reports, respectively. The results indicate that
hybridization: (i) is prone to API misuse, (ii) can result in performance
degradation -- the opposite of its intention, and (iii) has limited application
due to execution mode incompatibility. We put forth several recommendations,
best practices, and anti-patterns for effectively hybridizing imperative DL
code, potentially benefiting DL practitioners, API designers, tool developers,
and educators.
Related papers
- ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models [67.75439511654078]
Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses.<n>They face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications.<n>We propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment.
arXiv Detail & Related papers (2025-07-01T16:01:08Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models [48.361839372110246]
We develop an automated instruction generation pipeline that performs constraint expansion, conflict detection, and instruction rewriting.<n>We evaluate 19 large language models and uncover substantial variation in performance across constraint forms.<n>In-depth analysis indicates that these gains stem primarily from modifications in the model's attention modules parameters.
arXiv Detail & Related papers (2025-05-12T14:16:55Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - Safe Automated Refactoring for Efficient Migration of Imperative Deep Learning Programs to Graph Execution [4.461099699060121]
We present an automated approach to determine when it is safe and potentially advantageous to migrate imperative DL code to graph execution.
The approach is implemented as a PyDev Eclipse plug-in that integrates the WALA Ariadne analysis framework.
arXiv Detail & Related papers (2025-04-07T18:48:43Z) - Deep-Bench: Deep Learning Benchmark Dataset for Code Generation [2.897621520197328]
DeepBench is a novel benchmark dataset for function-level Deep learning code generation.
GPT-4o -- the state-of-the-art LLM -- achieved 31% accuracy on DeepBench, significantly lower than its 60% on DS-1000.
DeepBench offers valuable insights into the LLMs' performance and areas for potential improvement in the DL domain.
arXiv Detail & Related papers (2025-02-26T00:43:50Z) - More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives [51.497338578427915]
Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates.<n>DrICL is a novel optimization method that enhances model performance through textitDifferentiated and textitReweighting objectives.<n>We develop the textitMany-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens.
arXiv Detail & Related papers (2025-01-07T14:57:08Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - Science-Informed Deep Learning (ScIDL) With Applications to Wireless Communications [11.472232944923558]
This article provides a tutorial on science-informed deep learning (ScIDL)
ScIDL aims to integrate existing scientific knowledge with DL techniques to develop more powerful algorithms.
We discuss both recent applications of ScIDL and potential future research directions in the field of wireless communications.
arXiv Detail & Related papers (2024-06-29T02:35:39Z) - On the Worst Prompt Performance of Large Language Models [93.13542053835542]
Performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts.
We introduce RobustAlpacaEval, a new benchmark that consists of semantically equivalent case-level queries.
Experiments on RobustAlpacaEval with ChatGPT and six open-source LLMs from the Llama, Mistral, and Gemma families uncover substantial variability in model performance.
arXiv Detail & Related papers (2024-06-08T13:40:38Z) - DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection [12.686480870065827]
This paper contributes textbfDLAP, a framework that combines the best of both deep learning (DL) models and Large Language Models (LLMs) to achieve exceptional vulnerability detection performance.
Experiment results confirm that DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts.
arXiv Detail & Related papers (2024-05-02T11:44:52Z) - Towards Safe Automated Refactoring of Imperative Deep Learning Programs
to Graph Execution [4.786072763033669]
More natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance.
We present our ongoing work on an automated approach that assists developers in specifying whether and how their otherwise imperative DL code could be reliably and efficiently executed as graphs.
The approach is being implemented as a PyDev Eclipse plug-in and uses the WALA Ariadne analysis framework.
arXiv Detail & Related papers (2023-08-22T20:50:19Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - DeepFD: Automated Fault Diagnosis and Localization for Deep Learning
Programs [15.081278640511998]
DeepFD is a learning-based fault diagnosis and localization framework.
It maps the fault localization task to a learning problem.
It correctly diagnoses 52% faulty DL programs, compared with around half (27%) achieved by the best state-of-the-art works.
arXiv Detail & Related papers (2022-05-04T08:15:56Z) - Automatic Fault Detection for Deep Learning Programs Using Graph
Transformations [13.572917264310119]
We propose NeuraLint, a model-based fault detection approach for Deep Learning programs.
NeuraLint effectively detects faults and design issues in both synthesized and real-world examples with a recall of 70.5 % and a precision of 100 %.
Although the proposed meta-model is designed for feedforward neural networks, it can be extended to support other neural network architectures.
arXiv Detail & Related papers (2021-05-17T18:06:11Z) - CogDL: A Comprehensive Library for Graph Deep Learning [55.694091294633054]
We present CogDL, a library for graph deep learning that allows researchers and practitioners to conduct experiments, compare methods, and build applications with ease and efficiency.
In CogDL, we propose a unified design for the training and evaluation of GNN models for various graph tasks, making it unique among existing graph learning libraries.
We develop efficient sparse operators for CogDL, enabling it to become the most competitive graph library for efficiency.
arXiv Detail & Related papers (2021-03-01T12:35:16Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.