Revisiting Rashomon: A Comment on "The Two Cultures"
- URL: http://arxiv.org/abs/2104.02150v1
- Date: Mon, 5 Apr 2021 20:51:58 GMT
- Title: Revisiting Rashomon: A Comment on "The Two Cultures"
- Authors: Alexander D'Amour
- Abstract summary: Breiman dubbed the "Rashomon Effect", describing the situation in which there are many models that satisfy predictive accuracy criteria equally well, but process information in substantially different ways.
This phenomenon can make it difficult to draw conclusions or automate decisions based on a model fit to data.
I make connections to recent work in the Machine Learning literature that explore the implications of this issue.
- Score: 95.81740983484471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Here, I provide some reflections on Prof. Leo Breiman's "The Two Cultures"
paper. I focus specifically on the phenomenon that Breiman dubbed the "Rashomon
Effect", describing the situation in which there are many models that satisfy
predictive accuracy criteria equally well, but process information in the data
in substantially different ways. This phenomenon can make it difficult to draw
conclusions or automate decisions based on a model fit to data. I make
connections to recent work in the Machine Learning literature that explore the
implications of this issue, and note that grappling with it can be a fruitful
area of collaboration between the algorithmic and data modeling cultures.
Related papers
- Amazing Things Come From Having Many Good Models [15.832860655980918]
The Rashomon Effect describes the phenomenon that there exist many equally good predictive models for the same dataset.
This perspective piece proposes reshaping the way we think about machine learning.
Our goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.
arXiv Detail & Related papers (2024-07-05T20:14:36Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - Causalainer: Causal Explainer for Automatic Video Summarization [77.36225634727221]
In many application scenarios, improper video summarization can have a large impact.
Modeling explainability is a key concern.
A Causal Explainer, dubbed Causalainer, is proposed to address this issue.
arXiv Detail & Related papers (2023-04-30T11:42:06Z) - Bridging the Data Gap between Training and Inference for Unsupervised
Neural Machine Translation [49.916963624249355]
A UNMT model is trained on the pseudo parallel data with translated source, and natural source sentences in inference.
The source discrepancy between training and inference hinders the translation performance of UNMT models.
We propose an online self-training approach, which simultaneously uses the pseudo parallel data natural source, translated target to mimic the inference scenario.
arXiv Detail & Related papers (2022-03-16T04:50:27Z) - Breiman's two cultures: You don't have to choose sides [10.695407438192527]
Breiman's classic paper casts data analysis as a choice between two cultures.
Data modelers use simple, interpretable models with well-understood theoretical properties to analyze data.
Algorithm modelers prioritize predictive accuracy and use more flexible function approximations to analyze data.
arXiv Detail & Related papers (2021-04-25T17:58:46Z) - Bridging Breiman's Brook: From Algorithmic Modeling to Statistical
Learning [6.837936479339647]
In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures.
Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries.
We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox.
arXiv Detail & Related papers (2021-02-23T03:38:41Z) - Distilling Double Descent [65.85258126760502]
Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model.
We show, that, even when the teacher model is highly over parameterized, and provides emphhard labels, using a very large held-out unlabeled dataset can result in a model that outperforms more "traditional" approaches.
arXiv Detail & Related papers (2021-02-13T02:26:48Z) - Disembodied Machine Learning: On the Illusion of Objectivity in NLP [21.169778613387827]
We argue that addressing and mitigating biases is near-impossible.
We find the prevalent discourse of bias limiting in its ability to address social marginalisation.
We recommend to be conscientious of this, and to accept that de-biasing methods only correct for a fraction of biases.
arXiv Detail & Related papers (2021-01-28T12:58:39Z) - The Extraordinary Failure of Complement Coercion Crowdsourcing [50.599433903377374]
Crowdsourcing has eased and scaled up the collection of linguistic annotation in recent years.
We aim to collect annotated data for this phenomenon by reducing it to either of two known tasks: Explicit Completion and Natural Language Inference.
In both cases, crowdsourcing resulted in low agreement scores, even though we followed the same methodologies as in previous work.
arXiv Detail & Related papers (2020-10-12T19:04:04Z) - Breiman's "Two Cultures" Revisited and Reconciled [0.0]
Two cultures of data modeling: parametric statistical and algorithmic machine learning.
The widening gap between "the two cultures" cannot be averted unless we find a way to blend them into a coherent whole.
This article presents a solution by establishing a link between the two cultures.
arXiv Detail & Related papers (2020-05-27T19:02:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.