Is machine learning good or bad for the natural sciences?
- URL: http://arxiv.org/abs/2405.18095v2
- Date: Fri, 31 May 2024 22:28:18 GMT
- Title: Is machine learning good or bad for the natural sciences?
- Authors: David W. Hogg, Soledad Villar,
- Abstract summary: We show that there are contexts in which the introduction of ML introduces strong, unwanted statistical biases.
For one, when ML models are used to emulate physical (or first-principles) simulations, they amplify confirmation biases.
For another, when expressive regressions are used to label datasets, those labels cannot be used without taking on downstream biases.
- Score: 7.41244589428771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) methods are having a huge impact across all of the sciences. However, ML has a strong ontology - in which only the data exist - and a strong epistemology - in which a model is considered good if it performs well on held-out training data. These philosophies are in strong conflict with both standard practices and key philosophies in the natural sciences. Here we identify some locations for ML in the natural sciences at which the ontology and epistemology are valuable. For example, when an expressive machine learning model is used in a causal inference to represent the effects of confounders, such as foregrounds, backgrounds, or instrument calibration parameters, the model capacity and loose philosophy of ML can make the results more trustworthy. We also show that there are contexts in which the introduction of ML introduces strong, unwanted statistical biases. For one, when ML models are used to emulate physical (or first-principles) simulations, they amplify confirmation biases. For another, when expressive regressions are used to label datasets, those labels cannot be used in downstream joint or ensemble analyses without taking on uncontrolled biases. The question in the title is being asked of all of the natural sciences; that is, we are calling on the scientific communities to take a step back and consider the role and value of ML in their fields; the (partial) answers we give here come from the particular perspective of physics.
Related papers
- Scientific Machine Learning Seismology [0.0]
Scientific machine learning (SciML) is an interdisciplinary research field that integrates machine learning, particularly deep learning, with physics theory to understand and predict complex natural phenomena.
PINNs and neural operators (NOs) are two popular methods for SciML.
The use of PINNs is expanding into areas such as simultaneous solutions of differential equations, inference in underdetermined systems, and regularization based on physics.
arXiv Detail & Related papers (2024-09-27T02:27:42Z) - Machine Learning and Theory Ladenness -- A Phenomenological Account [44.99833362998488]
We argue that both positions are overly simplistic and do not advance our understanding of the interplay between ML methods and domain theories.
Our analysis reveals that, while the construction of models can be relatively independent of domain theory, the practical implementation and interpretation of these models within a given specific domain still relies on fundamental theoretical assumptions and background knowledge.
arXiv Detail & Related papers (2024-09-17T15:29:14Z) - A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence [2.498836880652668]
We present a novel framework for thinking about the deployment of machine learning models in a performative, human-ML collaborative system.
In our framework, the introduction of ML recommendations changes the data-generating process of human decisions.
We find that for many levels of ML performance, humans can improve upon the ML predictions.
arXiv Detail & Related papers (2024-05-22T15:38:30Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Constructing Effective Machine Learning Models for the Sciences: A
Multidisciplinary Perspective [77.53142165205281]
We show how flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models.
We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models.
arXiv Detail & Related papers (2022-11-21T17:48:44Z) - Learning Physical Dynamics with Subequivariant Graph Neural Networks [99.41677381754678]
Graph Neural Networks (GNNs) have become a prevailing tool for learning physical dynamics.
Physical laws abide by symmetry, which is a vital inductive bias accounting for model generalization.
Our model achieves on average over 3% enhancement in contact prediction accuracy across 8 scenarios on Physion and 2X lower rollout MSE on RigidFall.
arXiv Detail & Related papers (2022-10-13T10:00:30Z) - The Need for Interpretable Features: Motivation and Taxonomy [69.07189753428553]
We claim that the term "interpretable feature" is not specific nor detailed enough to capture the full extent to which features impact the usefulness of machine learning explanations.
In this paper, we motivate and discuss three key lessons: 1) more attention should be given to what we refer to as the interpretable feature space, or the state of features that are useful to domain experts taking real-world actions.
arXiv Detail & Related papers (2022-02-23T19:19:14Z) - A Review of Physics-based Machine Learning in Civil Engineering [0.0]
Machine learning (ML) is a significant tool that can be applied across many disciplines.
ML for civil engineering applications that are simulated in the lab often fail in real-world tests.
This paper reviews the history of physics-based ML and its application in civil engineering.
arXiv Detail & Related papers (2021-10-09T15:50:21Z) - Insights into Performance Fitness and Error Metrics for Machine Learning [1.827510863075184]
Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis.
This paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms.
arXiv Detail & Related papers (2020-05-17T22:59:04Z) - An Information-Theoretic Approach to Personalized Explainable Machine
Learning [92.53970625312665]
We propose a simple probabilistic model for the predictions and user knowledge.
We quantify the effect of an explanation by the conditional mutual information between the explanation and prediction.
arXiv Detail & Related papers (2020-03-01T13:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.