Robust Causality and False Attribution in Data-Driven Earth Science
Discoveries
- URL: http://arxiv.org/abs/2209.12580v1
- Date: Mon, 26 Sep 2022 10:45:48 GMT
- Title: Robust Causality and False Attribution in Data-Driven Earth Science
Discoveries
- Authors: Elizabeth Eldhose (1), Tejasvi Chauhan (1), Vikram Chandel (1),
Subimal Ghosh (1 and 2), and Auroop R. Ganguly (3 and 4) ((1) Department of
Civil Engineering, Indian Institute of Technology Bombay, Mumbai, India, (2)
Interdisciplinary Program in Climate Studies, Indian Institute of Technology
Bombay, Mumbai, India, (3) Sustainability and Data Sciences Laboratory,
Department of Civil and Environmental Engineering, Northeastern University,
Boston, MA, USA, (4) Pacific Northwest National Laboratory, Richland, WA,
USA)
- Abstract summary: Causal and attribution studies are essential for earth scientific discoveries and informing climate, ecology, and water policies.
Here we show that transfer entropy-based causal graphs can be spurious even when augmented with statistical significance.
We develop a subsample-based ensemble approach for robust causality analysis.
- Score: 0.3503794925747607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal and attribution studies are essential for earth scientific discoveries
and critical for informing climate, ecology, and water policies. However, the
current generation of methods needs to keep pace with the complexity of
scientific and stakeholder challenges and data availability combined with the
adequacy of data-driven methods. Unless carefully informed by physics, they run
the risk of conflating correlation with causation or getting overwhelmed by
estimation inaccuracies. Given that natural experiments, controlled trials,
interventions, and counterfactual examinations are often impractical,
information-theoretic methods have been developed and are being continually
refined in the earth sciences. Here we show that transfer entropy-based causal
graphs, which have recently become popular in the earth sciences with
high-profile discoveries, can be spurious even when augmented with statistical
significance. We develop a subsample-based ensemble approach for robust
causality analysis. Simulated data, and observations in climate and
ecohydrology, suggest the robustness and consistency of this approach.
Related papers
- Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation.
We assume binary effects that are recorded as high-dimensional images in a Randomized Controlled Trial.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones.
We find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Causality for Earth Science -- A Review on Time-series and Spatiotemporal Causality Methods [2.790669554650619]
The paper presents an overview of causal discovery and causal inference, explains the underlying causal assumptions, and enlists evaluation techniques.
The paper elicits the state-of-the-art methods introduced for time-series andtemporal causal analysis along with their strengths and limitations.
arXiv Detail & Related papers (2024-04-03T14:33:23Z) - A data science axiology: the nature, value, and risks of data science [0.0]
Data science is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery.
This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving.
arXiv Detail & Related papers (2023-07-19T21:12:04Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - Discovering Causal Relations and Equations from Data [23.802778299505288]
This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics.
We provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies.
Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.
arXiv Detail & Related papers (2023-05-21T19:22:50Z) - GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z) - Spatiotemporal modeling of European paleoclimate using doubly sparse
Gaussian processes [61.31361524229248]
We build on recent scale sparsetemporal GPs to reduce the computational burden.
We successfully employ such a doubly sparse GP to construct a probabilistic model of paleoclimate.
arXiv Detail & Related papers (2022-11-15T14:15:04Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - Scalable Sensitivity and Uncertainty Analysis for Causal-Effect
Estimates of Continuous-Valued Interventions [34.19821413853115]
Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics.
We develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding.
We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data.
arXiv Detail & Related papers (2022-04-21T11:15:10Z) - A Data Scientist's Guide to Streamflow Prediction [55.22219308265945]
We focus on the element of hydrologic rainfall--runoff models and their application to forecast floods and predict streamflow.
This guide aims to help interested data scientists gain an understanding of the problem, the hydrologic concepts involved, and the details that come up along the way.
arXiv Detail & Related papers (2020-06-05T08:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.