Related papers: Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case

Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case

URL: http://arxiv.org/abs/2407.18544v1
Date: Fri, 26 Jul 2024 06:50:17 GMT
Title: Utilising Explainable Techniques for Quality Prediction in a Complex Textiles Manufacturing Use Case
Authors: Briony Forsberg, Dr Henry Williams, Prof Bruce MacDonald, Tracy Chen, Dr Reza Hamzeh, Dr Kirstine Hulse,
Abstract summary: This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques. In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper develops an approach to classify instances of product failure in a complex textiles manufacturing dataset using explainable techniques. The dataset used in this study was obtained from a New Zealand manufacturer of woollen carpets and rugs. In investigating the trade-off between accuracy and explainability, three different tree-based classification algorithms were evaluated: a Decision Tree and two ensemble methods, Random Forest and XGBoost. Additionally, three feature selection methods were also evaluated: the SelectKBest method, using chi-squared as the scoring function, the Pearson Correlation Coefficient, and the Boruta algorithm. Not surprisingly, the ensemble methods typically produced better results than the Decision Tree model. The Random Forest model yielded the best results overall when combined with the Boruta feature selection technique. Finally, a tree ensemble explaining technique was used to extract rule lists to capture necessary and sufficient conditions for classification by a trained model that could be easily interpreted by a human. Notably, several features that were in the extracted rule lists were statistical features and calculated features that were added to the original dataset. This demonstrates the influence that bringing in additional information during the data preprocessing stages can have on the ultimate model performance.

Related papers

A novel gradient-based method for decision trees optimizing arbitrary differential loss functions [2.4861619769660637]
This work introduces a novel method for constructing gradient-based decision trees that optimize arbitrary differentiable loss functions. We demonstrate the method's applicability to classification, regression, and survival analysis tasks. The implementation of the method is publicly available, providing a practical tool for researchers and practitioners.
arXiv Detail & Related papers (2025-03-22T20:25:30Z)
Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data. We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z)
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks [59.47851630504264]
Free-text explanations are expressive and easy to understand, but many datasets lack annotated explanation data. We fine-tune T5-Large and OLMo-7B models and assess the impact of fine-tuning data quality, the number of fine-tuning samples, and few-shot selection methods. The models are evaluated on 19 diverse OOD datasets across three tasks: natural language inference (NLI), fact-checking, and hallucination detection in abstractive summarization.
arXiv Detail & Related papers (2025-02-07T10:01:32Z)
Which Imputation Fits Which Feature Selection Method? A Survey-Based Simulation Study [4.335350817722218]
Feature importance measures are usually considered for feature selection as well as to assess the effect of features on the outcome variables in the model. The typical solution is to impute the missing data before applying the learning method. We consider the two most common tree-based methods, Random Forest and XGBoost, and an interpretable linear model with regularization.
arXiv Detail & Related papers (2024-12-18T07:36:03Z)
A Closer Look at Deep Learning on Tabular Data [52.50778536274327]
Tabular data is prevalent across various domains in machine learning. Deep Neural Network (DNN)-based methods have shown promising performance comparable to tree-based ones.
arXiv Detail & Related papers (2024-07-01T04:24:07Z)
A Unified Approach to Extract Interpretable Rules from Tree Ensembles via Integer Programming [2.1408617023874443]
Tree ensemble methods are known for their effectiveness in supervised classification and regression tasks. Our work aims to extract an optimized list of rules from a trained tree ensemble, providing the user with a condensed, interpretable model.
arXiv Detail & Related papers (2024-06-30T22:33:47Z)
Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules. We use decision trees to convey this reasoning information, as they can be easily represented in natural language. OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z)
Learning accurate and interpretable decision trees [27.203303726977616]
We develop approaches to design decision tree learning algorithms given repeated access to data from the same domain. We study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression. We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees.
arXiv Detail & Related papers (2024-05-24T20:10:10Z)
A Comparison of Modeling Preprocessing Techniques [0.0]
This paper compares the performance of various data processing methods in terms of predictive performance for structured data. Three data sets of various structures, interactions, and complexity were constructed. We compare several methods for feature selection, categorical handling, and null imputation.
arXiv Detail & Related papers (2023-02-23T14:11:08Z)
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models. We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z)
Explaining random forest prediction through diverse rulesets [0.0]
Local Tree eXtractor (LTreeX) is able to explain the forest prediction for a given test instance with a few diverse rules. We show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.
arXiv Detail & Related papers (2022-03-29T12:54:57Z)
Evaluating State-of-the-Art Classification Models Against Bayes Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows. We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Modeling Text with Decision Forests using Categorical-Set Splits [2.434796198711328]
In axis-aligned decision forests, the "decision" to route an input example is the result of the evaluation of a condition on a single dimension in the feature space. We define a condition that is specific to categorical-set features and present an algorithm to learn it. Our algorithm is efficient during training and the resulting conditions are fast to evaluate with our extension of the QuickScorer inference algorithm.
arXiv Detail & Related papers (2020-09-21T16:16:35Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.