SpaCE: The Spatial Confounding Environment
- URL: http://arxiv.org/abs/2312.00710v2
- Date: Wed, 6 Dec 2023 02:00:53 GMT
- Title: SpaCE: The Spatial Confounding Environment
- Authors: Mauricio Tec, Ana Trisovic, Michelle Audirac, Sophie Woodward, Jie
Kate Hu, Naeem Khoshnevis, Francesca Dominici
- Abstract summary: SpaCE provides realistic benchmark datasets and tools for evaluating causal inference methods.
Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores.
SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models.
- Score: 2.572906392867547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatial confounding poses a significant challenge in scientific studies
involving spatial data, where unobserved spatial variables can influence both
treatment and outcome, possibly leading to spurious associations. To address
this problem, we introduce SpaCE: The Spatial Confounding Environment, the
first toolkit to provide realistic benchmark datasets and tools for
systematically evaluating causal inference methods designed to alleviate
spatial confounding. Each dataset includes training data, true counterfactuals,
a spatial graph with coordinates, and smoothness and confounding scores
characterizing the effect of a missing spatial confounder. It also includes
realistic semi-synthetic outcomes and counterfactuals, generated using
state-of-the-art machine learning ensembles, following best practices for
causal inference benchmarks. The datasets cover real treatment and covariates
from diverse domains, including climate, health and social sciences. SpaCE
facilitates an automated end-to-end pipeline, simplifying data loading,
experimental setup, and evaluating machine learning and causal inference
models. The SpaCE project provides several dozens of datasets of diverse sizes
and spatial complexity. It is publicly available as a Python package,
encouraging community feedback and contributions.
Related papers
- Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.
Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.
Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction [47.336599393600046]
textscFedNE is a novel approach that integrates the textscFedAvg framework with the contrastive NE technique.
We conduct comprehensive experiments on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-09-17T19:23:24Z) - Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning.
We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z) - Enhancing Ecological Monitoring with Multi-Objective Optimization: A Novel Dataset and Methodology for Segmentation Algorithms [17.802456388479616]
We introduce a unique semantic segmentation dataset of 6,096 high-resolution aerial images capturing indigenous and invasive grass species in Bega Valley, New South Wales, Australia.
This dataset presents a challenging task due to the overlap and distribution of grass species.
The dataset and code will be made publicly available, aiming to drive research in computer vision, machine learning, and ecological studies.
arXiv Detail & Related papers (2024-07-25T18:27:27Z) - Self-consistent Deep Geometric Learning for Heterogeneous Multi-source Spatial Point Data Prediction [10.646376827353551]
Multi-source spatial point data prediction is crucial in fields like environmental monitoring and natural resource management.
Existing models in this area often fall short due to their domain-specific nature and lack a strategy for integrating information from various sources.
We introduce an innovative multi-source spatial point data prediction framework that adeptly aligns information from varied sources without relying on ground truth labels.
arXiv Detail & Related papers (2024-06-30T16:13:13Z) - SSIN: Self-Supervised Learning for Rainfall Spatial Interpolation [37.212272184144]
We propose a data-driven self-supervised learning framework for rainfall spatial analysis.
By mining latent spatial patterns from historical data, SpaFormer can learn informative embeddings for raw data and then adaptively model spatial correlations.
Our method outperforms the state-of-the-art solutions in experiments on two real-world raingauge datasets.
arXiv Detail & Related papers (2023-11-27T04:23:47Z) - SPADES: A Realistic Spacecraft Pose Estimation Dataset using Event
Sensing [9.583223655096077]
Due to limited access to real target datasets, algorithms are often trained using synthetic data and applied in the real domain.
Event sensing has been explored in the past and shown to reduce the domain gap between simulations and real-world scenarios.
We introduce a novel dataset, SPADES, comprising real event data acquired in a controlled laboratory environment and simulated event data using the same camera intrinsics.
arXiv Detail & Related papers (2023-11-09T12:14:47Z) - Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol
Particles for Frontier Exploration [55.41644538483948]
This paper introduces a multimodal dataset from the harsh and unstructured underground environment with aerosol particles.
It contains synchronized raw data measurements from all onboard sensors in Robot Operating System (ROS) format.
The focus of this paper is not only to capture both temporal and spatial data diversities but also to present the impact of harsh conditions on captured data.
arXiv Detail & Related papers (2023-04-27T20:21:18Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for
Federated Learning on Non-IID Data [69.0785021613868]
Federated learning is a distributed machine learning approach which enables a shared server model to learn by aggregating the locally-computed parameter updates with the training data from spatially-distributed client silos.
We propose the Federated Invariant Learning Consistency (FedILC) approach, which leverages the gradient covariance and the geometric mean of Hessians to capture both inter-silo and intra-silo consistencies.
This is relevant to various fields such as medical healthcare, computer vision, and the Internet of Things (IoT)
arXiv Detail & Related papers (2022-05-19T03:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.