Bridging Breiman's Brook: From Algorithmic Modeling to Statistical
Learning
- URL: http://arxiv.org/abs/2102.12328v1
- Date: Tue, 23 Feb 2021 03:38:41 GMT
- Title: Bridging Breiman's Brook: From Algorithmic Modeling to Statistical
Learning
- Authors: Lucas Mentch and Giles Hooker
- Abstract summary: In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures.
Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries.
We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox.
- Score: 6.837936479339647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In 2001, Leo Breiman wrote of a divide between "data modeling" and
"algorithmic modeling" cultures. Twenty years later this division feels far
more ephemeral, both in terms of assigning individuals to camps, and in terms
of intellectual boundaries. We argue that this is largely due to the "data
modelers" incorporating algorithmic methods into their toolbox, particularly
driven by recent developments in the statistical understanding of Breiman's own
Random Forest methods. While this can be simplistically described as "Breiman
won", these same developments also expose the limitations of the
prediction-first philosophy that he espoused, making careful statistical
analysis all the more important. This paper outlines these exciting recent
developments in the random forest literature which, in our view, occurred as a
result of a necessary blending of the two ways of thinking Breiman originally
described. We also ask what areas statistics and statisticians might currently
overlook.
Related papers
- Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Extracting or Guessing? Improving Faithfulness of Event Temporal
Relation Extraction [87.04153383938969]
We improve the faithfulness of TempRel extraction models from two perspectives.
The first perspective is to extract genuinely based on contextual description.
The second perspective is to provide proper uncertainty estimation.
arXiv Detail & Related papers (2022-10-10T19:53:13Z) - The "given data" paradigm undermines both cultures [0.0]
Data is compelled into a "black box" with an arrow and then catapulted left by a second arrow, having been transformed into an output.
Breiman posits two interpretations of this visual as encapsulating a distinction between two cultures in statistics.
In this comment, I argue for a broader perspective on statistics and, in doing so, elevate questions from "before" and "after" the box as fruitful areas for statistical innovation and practice.
arXiv Detail & Related papers (2021-05-26T11:22:06Z) - Breiman's two cultures: You don't have to choose sides [10.695407438192527]
Breiman's classic paper casts data analysis as a choice between two cultures.
Data modelers use simple, interpretable models with well-understood theoretical properties to analyze data.
Algorithm modelers prioritize predictive accuracy and use more flexible function approximations to analyze data.
arXiv Detail & Related papers (2021-04-25T17:58:46Z) - Revisiting Rashomon: A Comment on "The Two Cultures" [95.81740983484471]
Breiman dubbed the "Rashomon Effect", describing the situation in which there are many models that satisfy predictive accuracy criteria equally well, but process information in substantially different ways.
This phenomenon can make it difficult to draw conclusions or automate decisions based on a model fit to data.
I make connections to recent work in the Machine Learning literature that explore the implications of this issue.
arXiv Detail & Related papers (2021-04-05T20:51:58Z) - Comments on Leo Breiman's paper 'Statistical Modeling: The Two Cultures'
(Statistical Science, 2001, 16(3), 199-231) [1.2183405753834562]
Breiman challenged statisticians to think more broadly, to step into the unknown, model-free learning world.
A new frontier has emerged; the one where the role, impact, or stability of the it learning algorithms is no longer measured by prediction quality, but by inferential one.
arXiv Detail & Related papers (2021-03-21T07:40:37Z) - Why do classifier accuracies show linear trends under distribution
shift? [58.40438263312526]
accuracies of models on one data distribution are approximately linear functions of the accuracies on another distribution.
We assume the probability that two models agree in their predictions is higher than what we can infer from their accuracy levels alone.
We show that a linear trend must occur when evaluating models on two distributions unless the size of the distribution shift is large.
arXiv Detail & Related papers (2020-12-31T07:24:30Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z) - Breiman's "Two Cultures" Revisited and Reconciled [0.0]
Two cultures of data modeling: parametric statistical and algorithmic machine learning.
The widening gap between "the two cultures" cannot be averted unless we find a way to blend them into a coherent whole.
This article presents a solution by establishing a link between the two cultures.
arXiv Detail & Related papers (2020-05-27T19:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.