Debugging Machine Learning Pipelines
- URL: http://arxiv.org/abs/2002.04640v1
- Date: Tue, 11 Feb 2020 19:13:12 GMT
- Title: Debugging Machine Learning Pipelines
- Authors: Raoni Louren\c{c}o and Juliana Freire and Dennis Shasha
- Abstract summary: Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought.
We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures.
- Score: 11.696401543261892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning tasks entail the use of complex computational pipelines to
reach quantitative and qualitative conclusions. If some of the activities in a
pipeline produce erroneous or uninformative outputs, the pipeline may fail or
produce incorrect results. Inferring the root cause of failures and unexpected
behavior is challenging, usually requiring much human thought, and is both
time-consuming and error-prone. We propose a new approach that makes use of
iteration and provenance to automatically infer the root causes and derive
succinct explanations of failures. Through a detailed experimental evaluation,
we assess the cost, precision, and recall of our approach compared to the state
of the art. Our source code and experimental data will be available for
reproducibility and enhancement.
Related papers
- Generalization Error in Quantum Machine Learning in the Presence of Sampling Noise [0.8532753451809455]
Eigentask Learning is a framework for learning with infinite input training data in the presence of output sampling noise.
We calculate the training and generalization errors of a generic quantum machine learning system when the input training dataset and output measurement sampling shots are both finite.
arXiv Detail & Related papers (2024-10-18T17:48:24Z) - Quantum Internet: Resource Estimation for Entanglement Routing [0.0]
We consider the problem of estimating the physical resources required for routing entanglement in a quantum network.
We propose a novel way of accounting for experimental errors in the purification process.
We show that the approximation works reasonably well over a wide-range of errors.
arXiv Detail & Related papers (2024-10-14T13:50:39Z) - When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails [19.80434777786657]
We develop a synthetic pipeline to generate targeted and labeled data.
We show that our method achieves competitive performance with a fraction of the cost in compute.
arXiv Detail & Related papers (2024-07-08T18:39:06Z) - Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE [68.6018458996143]
We propose a more general dynamic network that can combine both quantization and early exit dynamic network: QuEE.
Our algorithm can be seen as a form of soft early exiting or input-dependent compression.
The crucial factor of our approach is accurate prediction of the potential accuracy improvement achievable through further computation.
arXiv Detail & Related papers (2024-06-20T15:25:13Z) - DeepFunction: Deep Metric Learning-based Imbalanced Classification for Diagnosing Threaded Pipe Connection Defects using Functional Data [6.688305507010403]
In modern manufacturing, most of the product lines are conforming. Few products are nonconforming but with different defect types.
The identification of defect types can help further root cause diagnosis of production lines.
We propose an innovative classification framework based on deep metric learning using functional data (DeepFunction)
arXiv Detail & Related papers (2024-04-04T09:55:11Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - Doubly Robust Proximal Causal Learning for Continuous Treatments [56.05592840537398]
We propose a kernel-based doubly robust causal learning estimator for continuous treatments.
We show that its oracle form is a consistent approximation of the influence function.
We then provide a comprehensive convergence analysis in terms of the mean square error.
arXiv Detail & Related papers (2023-09-22T12:18:53Z) - Task-specific experimental design for treatment effect estimation [59.879567967089145]
Large randomised trials (RCTs) are the standard for causal inference.
Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought.
We develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications.
arXiv Detail & Related papers (2023-06-08T18:10:37Z) - Deep Learning based pipeline for anomaly detection and quality
enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space.
This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z) - A Survey on Extraction of Causal Relations from Natural Language Text [9.317718453037667]
Cause-effect relations appear frequently in text, and curating cause-effect relations from text helps in building causal networks for predictive tasks.
Existing causality extraction techniques include knowledge-based, statistical machine learning(ML)-based, and deep learning-based approaches.
arXiv Detail & Related papers (2021-01-16T10:49:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.