A Double Machine Learning Trend Model for Citizen Science Data
- URL: http://arxiv.org/abs/2210.15524v2
- Date: Wed, 10 May 2023 13:53:06 GMT
- Title: A Double Machine Learning Trend Model for Citizen Science Data
- Authors: Daniel Fink (1), Alison Johnston (2), Matt Strimas-Mackey (1), Tom
Auer (1), Wesley M. Hochachka (1), Shawn Ligocki (1), Lauren Oldham Jaromczyk
(1), Orin Robinson (1), Chris Wood (1), Steve Kelling (1), and Amanda D.
Rodewald (1) ((1) Cornell Lab of Ornithology, Cornell University, USA (2)
Centre for Research into Ecological and Environmental Modelling, School of
Maths and Statistics, University of St Andrews, St Andrews, UK)
- Abstract summary: We describe a novel modeling approach designed to estimate species population trends while controlling for the interannual confounding common in citizen science data.
The approach is based on Double Machine Learning, a statistical framework that uses machine learning methods to estimate population change and the propensity scores used to adjust for confounding discovered in the data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: 1. Citizen and community-science (CS) datasets have great potential for
estimating interannual patterns of population change given the large volumes of
data collected globally every year. Yet, the flexible protocols that enable
many CS projects to collect large volumes of data typically lack the structure
necessary to keep consistent sampling across years. This leads to interannual
confounding, as changes to the observation process over time are confounded
with changes in species population sizes.
2. Here we describe a novel modeling approach designed to estimate species
population trends while controlling for the interannual confounding common in
citizen science data. The approach is based on Double Machine Learning, a
statistical framework that uses machine learning methods to estimate population
change and the propensity scores used to adjust for confounding discovered in
the data. Additionally, we develop a simulation method to identify and adjust
for residual confounding missed by the propensity scores. Using this new
method, we can produce spatially detailed trend estimates from citizen science
data.
3. To illustrate the approach, we estimated species trends using data from
the CS project eBird. We used a simulation study to assess the ability of the
method to estimate spatially varying trends in the face of real-world
confounding. Results showed that the trend estimates distinguished between
spatially constant and spatially varying trends at a 27km resolution. There
were low error rates on the estimated direction of population change
(increasing/decreasing) and high correlations on the estimated magnitude.
4. The ability to estimate spatially explicit trends while accounting for
confounding in citizen science data has the potential to fill important
information gaps, helping to estimate population trends for species, regions,
or seasons without rigorous monitoring data.
Related papers
- Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z) - Mapping Urban Population Growth from Sentinel-2 MSI and Census Data
Using Deep Learning: A Case Study in Kigali, Rwanda [0.19116784879310023]
We evaluate how deep learning change detection techniques can unravel temporal population dynamics short intervals.
A ResNet encoder, pretrained on a population mapping task with Sentinel-2 MSI data, was incorporated into a Siamese network.
The network was trained at the census level to accurately predict population change.
arXiv Detail & Related papers (2023-03-15T10:39:31Z) - Copula-based transferable models for synthetic population generation [1.370096215615823]
Population synthesis involves generating synthetic yet realistic representations of a target population of micro-agents.
Traditional methods, often reliant on target population samples, face limitations due to high costs and small sample sizes.
We propose a novel framework based on copulas to generate synthetic data for target populations where only empirical marginal distributions are known.
arXiv Detail & Related papers (2023-02-17T23:58:14Z) - Time-Varying Propensity Score to Bridge the Gap between the Past and Present [104.46387765330142]
We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data.
We demonstrate different ways of implementing it and evaluate it on a variety of problems.
arXiv Detail & Related papers (2022-10-04T07:21:49Z) - So2Sat POP -- A Curated Benchmark Data Set for Population Estimation
from Space on a Continental Scale [11.38584315242023]
We provide a comprehensive data set for population estimation in 98 European cities.
The data set comprises a digital elevation model, local climate zone, land use proportions, nighttime lights in combination with multi-spectral Sentinel-2 imagery, and data from the Open Street Map initiative.
arXiv Detail & Related papers (2022-04-07T07:30:43Z) - Building Autocorrelation-Aware Representations for Fine-Scale
Spatiotemporal Prediction [1.2862507359003323]
We present a novel deep learning architecture that incorporates theories of spatial statistics into neural networks.
DeepLATTE contains an autocorrelation-guided semi-supervised learning strategy to enforce both local autocorrelation patterns and global autocorrelation trends.
We conduct a demonstration of DeepLATTE using publicly available data for an important public health topic, air quality prediction in a well-fitting, complex physical environment.
arXiv Detail & Related papers (2021-12-10T03:21:19Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Census-Independent Population Estimation using Representation Learning [0.5735035463793007]
Census-independent population estimation approaches using alternative data sources have shown promise in providing frequent and reliable population estimates locally.
We explore recent representation learning approaches, and assess the transferability of representations to population estimation in Mozambique.
Using representation learning reduces required human supervision, since features are extracted automatically.
We compare the resulting population estimates to existing population products from GRID3, Facebook (HRSL) and WorldPop.
arXiv Detail & Related papers (2021-10-06T15:13:36Z) - Magnify Your Population: Statistical Downscaling to Augment the Spatial
Resolution of Socioeconomic Census Data [48.7576911714538]
We present a new statistical downscaling approach to derive fine-scale estimates of key socioeconomic attributes.
For each selected socioeconomic variable, a Random Forest model is trained on the source Census units and then used to generate fine-scale gridded predictions.
As a case study, we apply this method to Census data in the United States, downscaling the selected socioeconomic variables available at the block group level, to a grid of 300 spatial resolution.
arXiv Detail & Related papers (2020-06-23T16:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.