Robust Causality and False Attribution in Data-Driven Earth Science
Discoveries
- URL: http://arxiv.org/abs/2209.12580v1
- Date: Mon, 26 Sep 2022 10:45:48 GMT
- Title: Robust Causality and False Attribution in Data-Driven Earth Science
Discoveries
- Authors: Elizabeth Eldhose (1), Tejasvi Chauhan (1), Vikram Chandel (1),
Subimal Ghosh (1 and 2), and Auroop R. Ganguly (3 and 4) ((1) Department of
Civil Engineering, Indian Institute of Technology Bombay, Mumbai, India, (2)
Interdisciplinary Program in Climate Studies, Indian Institute of Technology
Bombay, Mumbai, India, (3) Sustainability and Data Sciences Laboratory,
Department of Civil and Environmental Engineering, Northeastern University,
Boston, MA, USA, (4) Pacific Northwest National Laboratory, Richland, WA,
USA)
- Abstract summary: Causal and attribution studies are essential for earth scientific discoveries and informing climate, ecology, and water policies.
Here we show that transfer entropy-based causal graphs can be spurious even when augmented with statistical significance.
We develop a subsample-based ensemble approach for robust causality analysis.
- Score: 0.3503794925747607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Causal and attribution studies are essential for earth scientific discoveries
and critical for informing climate, ecology, and water policies. However, the
current generation of methods needs to keep pace with the complexity of
scientific and stakeholder challenges and data availability combined with the
adequacy of data-driven methods. Unless carefully informed by physics, they run
the risk of conflating correlation with causation or getting overwhelmed by
estimation inaccuracies. Given that natural experiments, controlled trials,
interventions, and counterfactual examinations are often impractical,
information-theoretic methods have been developed and are being continually
refined in the earth sciences. Here we show that transfer entropy-based causal
graphs, which have recently become popular in the earth sciences with
high-profile discoveries, can be spurious even when augmented with statistical
significance. We develop a subsample-based ensemble approach for robust
causality analysis. Simulated data, and observations in climate and
ecohydrology, suggest the robustness and consistency of this approach.
Related papers
- Causal Representation Learning in Temporal Data via Single-Parent Decoding [66.34294989334728]
Scientific research often seeks to understand the causal structure underlying high-level variables in a system.
Scientists typically collect low-level measurements, such as geographically distributed temperature readings.
We propose a differentiable method, Causal Discovery with Single-parent Decoding, that simultaneously learns the underlying latents and a causal graph over them.
arXiv Detail & Related papers (2024-10-09T15:57:50Z) - Hypothesizing Missing Causal Variables with LLMs [55.28678224020973]
We formulate a novel task where the input is a partial causal graph with missing variables, and the output is a hypothesis about the missing variables to complete the partial graph.
We show the strong ability of LLMs to hypothesize the mediation variables between a cause and its effect.
We also observe surprising results where some of the open-source models outperform the closed GPT-4 model.
arXiv Detail & Related papers (2024-09-04T10:37:44Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Causality for Earth Science -- A Review on Time-series and Spatiotemporal Causality Methods [2.790669554650619]
The paper presents an overview of causal discovery and causal inference, explains the underlying causal assumptions, and enlists evaluation techniques.
The paper elicits the state-of-the-art methods introduced for time-series andtemporal causal analysis along with their strengths and limitations.
arXiv Detail & Related papers (2024-04-03T14:33:23Z) - A data science axiology: the nature, value, and risks of data science [0.0]
Data science is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery.
This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving.
arXiv Detail & Related papers (2023-07-19T21:12:04Z) - Discovering Causal Relations and Equations from Data [23.802778299505288]
This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics.
We provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies.
Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.
arXiv Detail & Related papers (2023-05-21T19:22:50Z) - GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets.
GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop.
We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z) - Spatiotemporal modeling of European paleoclimate using doubly sparse
Gaussian processes [61.31361524229248]
We build on recent scale sparsetemporal GPs to reduce the computational burden.
We successfully employ such a doubly sparse GP to construct a probabilistic model of paleoclimate.
arXiv Detail & Related papers (2022-11-15T14:15:04Z) - A Data Scientist's Guide to Streamflow Prediction [55.22219308265945]
We focus on the element of hydrologic rainfall--runoff models and their application to forecast floods and predict streamflow.
This guide aims to help interested data scientists gain an understanding of the problem, the hydrologic concepts involved, and the details that come up along the way.
arXiv Detail & Related papers (2020-06-05T08:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.