Related papers: Robust Causality and False Attribution in Data-Driven Earth Science Discoveries

Robust Causality and False Attribution in Data-Driven Earth Science Discoveries

URL: http://arxiv.org/abs/2209.12580v1
Date: Mon, 26 Sep 2022 10:45:48 GMT
Title: Robust Causality and False Attribution in Data-Driven Earth Science Discoveries
Authors: Elizabeth Eldhose (1), Tejasvi Chauhan (1), Vikram Chandel (1), Subimal Ghosh (1 and 2), and Auroop R. Ganguly (3 and 4) ((1) Department of Civil Engineering, Indian Institute of Technology Bombay, Mumbai, India, (2) Interdisciplinary Program in Climate Studies, Indian Institute of Technology Bombay, Mumbai, India, (3) Sustainability and Data Sciences Laboratory, Department of Civil and Environmental Engineering, Northeastern University, Boston, MA, USA, (4) Pacific Northwest National Laboratory, Richland, WA, USA)
Abstract summary: Causal and attribution studies are essential for earth scientific discoveries and informing climate, ecology, and water policies. Here we show that transfer entropy-based causal graphs can be spurious even when augmented with statistical significance. We develop a subsample-based ensemble approach for robust causality analysis.
Score: 0.3503794925747607
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Causal and attribution studies are essential for earth scientific discoveries and critical for informing climate, ecology, and water policies. However, the current generation of methods needs to keep pace with the complexity of scientific and stakeholder challenges and data availability combined with the adequacy of data-driven methods. Unless carefully informed by physics, they run the risk of conflating correlation with causation or getting overwhelmed by estimation inaccuracies. Given that natural experiments, controlled trials, interventions, and counterfactual examinations are often impractical, information-theoretic methods have been developed and are being continually refined in the earth sciences. Here we show that transfer entropy-based causal graphs, which have recently become popular in the earth sciences with high-profile discoveries, can be spurious even when augmented with statistical significance. We develop a subsample-based ensemble approach for robust causality analysis. Simulated data, and observations in climate and ecohydrology, suggest the robustness and consistency of this approach.

Related papers

Building Machine Learning Challenges for Anomaly Detection in Science [94.24422981343699]
We present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains. We present a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable.
arXiv Detail & Related papers (2025-03-03T22:54:07Z)
Discovering Latent Structural Causal Models from Spatio-Temporal Data [23.400027588427964]
We present SPACY (SPAtiotemporal Causal discoverY), a novel framework based on variational inference. We show that SPACY outperforms state-of-the-art baselines on synthetic data, remains scalable for large grids, and identifies key known phenomena from real-world climate data.
arXiv Detail & Related papers (2024-11-08T05:12:16Z)
Causal Representation Learning in Temporal Data via Single-Parent Decoding [66.34294989334728]
Scientific research often seeks to understand the causal structure underlying high-level variables in a system. Scientists typically collect low-level measurements, such as geographically distributed temperature readings. We propose a differentiable method, Causal Discovery with Single-parent Decoding, that simultaneously learns the underlying latents and a causal graph over them.
arXiv Detail & Related papers (2024-10-09T15:57:50Z)
Hypothesizing Missing Causal Variables with LLMs [55.28678224020973]
We formulate a novel task where the input is a partial causal graph with missing variables, and the output is a hypothesis about the missing variables to complete the partial graph. We show the strong ability of LLMs to hypothesize the mediation variables between a cause and its effect. We also observe surprising results where some of the open-source models outperform the closed GPT-4 model.
arXiv Detail & Related papers (2024-09-04T10:37:44Z)
Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation, where the outcome of interest is recorded in high-dimensional observations. We compare 6 480 models fine-tuned from state-of-the-art visual backbones, and find that the sampling and modeling choices significantly affect the accuracy of the causal estimate. Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones.
arXiv Detail & Related papers (2024-05-27T13:26:34Z)
Causality for Earth Science -- A Review on Time-series and Spatiotemporal Causality Methods [2.790669554650619]
The paper presents an overview of causal discovery and causal inference, explains the underlying causal assumptions, and enlists evaluation techniques. The paper elicits the state-of-the-art methods introduced for time-series andtemporal causal analysis along with their strengths and limitations.
arXiv Detail & Related papers (2024-04-03T14:33:23Z)
A data science axiology: the nature, value, and risks of data science [0.0]
Data science is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery. This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving.
arXiv Detail & Related papers (2023-07-19T21:12:04Z)
Discovering Causal Relations and Equations from Data [23.802778299505288]
This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics. We provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems.
arXiv Detail & Related papers (2023-05-21T19:22:50Z)
GFlowNets for AI-Driven Scientific Discovery [74.27219800878304]
We present a new probabilistic machine learning framework called GFlowNets. GFlowNets can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. We argue that GFlowNets can become a valuable tool for AI-driven scientific discovery.
arXiv Detail & Related papers (2023-02-01T17:29:43Z)
Spatiotemporal modeling of European paleoclimate using doubly sparse Gaussian processes [61.31361524229248]
We build on recent scale sparsetemporal GPs to reduce the computational burden. We successfully employ such a doubly sparse GP to construct a probabilistic model of paleoclimate.
arXiv Detail & Related papers (2022-11-15T14:15:04Z)
A Data Scientist's Guide to Streamflow Prediction [55.22219308265945]
We focus on the element of hydrologic rainfall--runoff models and their application to forecast floods and predict streamflow. This guide aims to help interested data scientists gain an understanding of the problem, the hydrologic concepts involved, and the details that come up along the way.
arXiv Detail & Related papers (2020-06-05T08:04:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.