Interpretable Machine Learning for Discovery: Statistical Challenges \&
Opportunities
- URL: http://arxiv.org/abs/2308.01475v1
- Date: Wed, 2 Aug 2023 23:57:31 GMT
- Title: Interpretable Machine Learning for Discovery: Statistical Challenges \&
Opportunities
- Authors: Genevera I. Allen, Luqin Gan, Lili Zheng
- Abstract summary: We discuss and review the field of interpretable machine learning.
We outline the types of discoveries that can be made using Interpretable Machine Learning.
We focus on the grand challenge of how to validate these discoveries in a data-driven manner.
- Score: 1.2891210250935146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: New technologies have led to vast troves of large and complex datasets across
many scientific domains and industries. People routinely use machine learning
techniques to not only process, visualize, and make predictions from this big
data, but also to make data-driven discoveries. These discoveries are often
made using Interpretable Machine Learning, or machine learning models and
techniques that yield human understandable insights. In this paper, we discuss
and review the field of interpretable machine learning, focusing especially on
the techniques as they are often employed to generate new knowledge or make
discoveries from large data sets. We outline the types of discoveries that can
be made using Interpretable Machine Learning in both supervised and
unsupervised settings. Additionally, we focus on the grand challenge of how to
validate these discoveries in a data-driven manner, which promotes trust in
machine learning systems and reproducibility in science. We discuss validation
from both a practical perspective, reviewing approaches based on data-splitting
and stability, as well as from a theoretical perspective, reviewing statistical
results on model selection consistency and uncertainty quantification via
statistical inference. Finally, we conclude by highlighting open challenges in
using interpretable machine learning techniques to make discoveries, including
gaps between theory and practice for validating data-driven-discoveries.
Related papers
- Verification of Machine Unlearning is Fragile [48.71651033308842]
We introduce two novel adversarial unlearning processes capable of circumventing both types of verification strategies.
This study highlights the vulnerabilities and limitations in machine unlearning verification, paving the way for further research into the safety of machine unlearning.
arXiv Detail & Related papers (2024-08-01T21:37:10Z) - A spectrum of physics-informed Gaussian processes for regression in
engineering [0.0]
Despite the growing availability of sensing and data in general, we remain unable to fully characterise many in-service engineering systems and structures from a purely data-driven approach.
This paper pursues the combination of machine learning technology and physics-based reasoning to enhance our ability to make predictive models with limited data.
arXiv Detail & Related papers (2023-09-19T14:39:03Z) - A Vision for Semantically Enriched Data Science [19.604667287258724]
Key areas such as utilizing domain knowledge and data semantics are areas where we have seen little automation.
We envision how leveraging "semantic" understanding and reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
arXiv Detail & Related papers (2023-03-02T16:03:12Z) - Bridging Machine Learning and Sciences: Opportunities and Challenges [0.0]
Application of machine learning in sciences has seen exciting advances in recent years.
Recently, deep neural nets-based out-of-distribution detection has made great progress for high-dimensional data.
We take a critical look at their applicative prospects including data universality, experimental protocols, model robustness, etc.
arXiv Detail & Related papers (2022-10-24T17:54:46Z) - Open Environment Machine Learning [84.90891046882213]
Conventional machine learning studies assume close world scenarios where important factors of the learning process hold invariant.
This article briefly introduces some advances in this line of research, focusing on techniques concerning emerging new classes, decremental/incremental features, changing data distributions, varied learning objectives, and discusses some theoretical issues.
arXiv Detail & Related papers (2022-06-01T11:57:56Z) - Causal Reasoning Meets Visual Representation Learning: A Prospective
Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models.
Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms.
This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z) - Ten Quick Tips for Deep Learning in Biology [116.78436313026478]
Machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling.
Deep learning has become its own subfield of machine learning.
In the context of biological research, deep learning has been increasingly used to derive novel insights from high-dimensional biological data.
arXiv Detail & Related papers (2021-05-29T21:02:44Z) - Knowledge as Invariance -- History and Perspectives of
Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point.
Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks.
This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - Principles and Practice of Explainable Machine Learning [12.47276164048813]
This report focuses on data-driven methods -- machine learning (ML) and pattern recognition models in particular.
With the increasing prevalence and complexity of methods, business stakeholders in the very least have a growing number of concerns about the drawbacks of models.
We have undertaken a survey to help industry practitioners understand the field of explainable machine learning better.
arXiv Detail & Related papers (2020-09-18T14:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.