Opening practice: supporting Reproducibility and Critical spatial data
science
- URL: http://arxiv.org/abs/2008.03256v1
- Date: Mon, 20 Jul 2020 07:50:08 GMT
- Title: Opening practice: supporting Reproducibility and Critical spatial data
science
- Authors: Chris Brunsdon and Alexis Comber
- Abstract summary: This paper reflects on a number of trends towards a more open and reproducible approach to spatial data science.
In particular it considers trends towards Big Data, and the impacts this is having on spatial data analysis and modelling.
It identifies a turn in academia towards coding as a core analytic tool, and away from proprietary software tools offering 'black boxes'
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper reflects on a number of trends towards a more open and
reproducible approach to geographic and spatial data science over recent years.
In particular it considers trends towards Big Data, and the impacts this is
having on spatial data analysis and modelling. It identifies a turn in academia
towards coding as a core analytic tool, and away from proprietary software
tools offering 'black boxes' where the internal workings of the analysis are
not revealed. It is argued that this closed form software is problematic, and
considers a number of ways in which issues identified in spatial data analysis
(such as the MAUP) could be overlooked when working with closed tools, leading
to problems of interpretation and possibly inappropriate actions and policies
based on these. In addition, this paper and considers the role that
reproducible and open spatial science may play in such an approach, taking into
account the issues raised. It highlights the dangers of failing to account for
the geographical properties of data, now that all data are spatial (they are
collected somewhere), the problems of a desire for n=all observations in data
science and it identifies the need for a critical approach. This is one in
which openness, transparency, sharing and reproducibility provide a mantra for
defensible and robust spatial data science.
Related papers
- ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation [37.73074657448699]
bfManiBox is a novel bounding-box-guided manipulation method built on a simulation-based teacher-student framework.
ManiBox demonstrates a marked improvement in spatial grasping generalization and adaptability to diverse objects and backgrounds.
arXiv Detail & Related papers (2024-11-04T07:05:02Z) - Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings.
Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research.
This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z) - Spatial-temporal Forecasting for Regions without Observations [13.805203053973772]
We study spatial-temporal forecasting for a region of interest without any historical observations.
We propose a model named STSM for the task.
Our key insight is to learn from the locations that resemble those in the region of interest.
arXiv Detail & Related papers (2024-01-19T06:26:05Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Satellite Image Time Series Analysis for Big Earth Observation Data [50.591267188664666]
This paper describes sits, an open-source R package for satellite image time series analysis using machine learning.
We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome.
arXiv Detail & Related papers (2022-04-24T15:23:25Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Occams Razor for Big Data? On Detecting Quality in Large Unstructured
Datasets [0.0]
New trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science.
Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time.
The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics.
arXiv Detail & Related papers (2020-11-12T16:06:01Z) - Big Issues for Big Data: challenges for critical spatial data analytics [0.0]
We focus on a set of challenges underlying the collection and analysis of big data.
We consider the issues related to inference when working with usually biased big data.
In particular we consider the need to place individual data science studies in a wider social and economic contexts.
arXiv Detail & Related papers (2020-07-22T09:11:56Z) - Wide-Area Data Analytics [4.080171822768553]
We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations.
The Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019.
This report summarizes the challenges discussed and the conclusions generated at the workshop.
arXiv Detail & Related papers (2020-06-17T22:44:33Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.