Big Issues for Big Data: challenges for critical spatial data analytics
- URL: http://arxiv.org/abs/2007.11281v2
- Date: Tue, 11 Aug 2020 12:56:18 GMT
- Title: Big Issues for Big Data: challenges for critical spatial data analytics
- Authors: Chris Brunsdon and Alexis Comber
- Abstract summary: We focus on a set of challenges underlying the collection and analysis of big data.
We consider the issues related to inference when working with usually biased big data.
In particular we consider the need to place individual data science studies in a wider social and economic contexts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we consider some of the issues of working with big data and big
spatial data and highlight the need for an open and critical framework. We
focus on a set of challenges underlying the collection and analysis of big
data. In particular, we consider 1) the issues related to inference when
working with usually biased big data, challenging the assumed inferential
superiority of data with observations, n, approaching N, the population (n->N),
and the need for data science analysis that answer questions of practical
significance or with greater emphasis n the size of the effect, rather than the
truth or falsehood of a statistical statement; 2) the need to accept messiness
in your data and to document all operations undertaken on the data because of
this support of openness and reproducibility paradigms; and 3) the need to
explicitly seek to understand the causes of bias, messiness etc in the data and
the inferential consequences of using such data in analyses, by adopting
critical approaches to spatial data science. In particular we consider the need
to place individual data science studies in a wider social and economic
contexts, along the the role of inferential theory in the presence of big data,
and issues relating to messiness and complexity in big data.
Related papers
- Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings.
Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research.
This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - Continual Causal Effect Estimation: Challenges and Opportunities [11.343298687766579]
A further understanding of cause and effect within observational data is critical across many domains.
The existing methods mainly focus on source-specific and stationary observational data.
In the era of big data, we face new challenges in causal inference with observational data.
arXiv Detail & Related papers (2023-01-03T09:57:50Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Occams Razor for Big Data? On Detecting Quality in Large Unstructured
Datasets [0.0]
New trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science.
Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time.
The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics.
arXiv Detail & Related papers (2020-11-12T16:06:01Z) - Opening practice: supporting Reproducibility and Critical spatial data
science [0.0]
This paper reflects on a number of trends towards a more open and reproducible approach to spatial data science.
In particular it considers trends towards Big Data, and the impacts this is having on spatial data analysis and modelling.
It identifies a turn in academia towards coding as a core analytic tool, and away from proprietary software tools offering 'black boxes'
arXiv Detail & Related papers (2020-07-20T07:50:08Z) - Towards an Integrated Platform for Big Data Analysis [4.5257812998381315]
This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects.
Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, and an improved usability during the end-to-end data analysis process.
arXiv Detail & Related papers (2020-04-27T03:15:23Z) - A Philosophy of Data [91.3755431537592]
We work from the fundamental properties necessary for statistical computation to a definition of statistical data.
We argue that the need for useful data to be commensurable rules out an understanding of properties as fundamentally unique or equal.
With our increasing reliance on data and data technologies, these two characteristics of data affect our collective conception of reality.
arXiv Detail & Related papers (2020-04-15T14:47:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.