Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots:
A Survey
- URL: http://arxiv.org/abs/2103.12019v1
- Date: Mon, 22 Mar 2021 17:22:30 GMT
- Title: Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots:
A Survey
- Authors: Yiqun Xie, Shashi Shekhar, Yan Li
- Abstract summary: Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results.
We present an up-to-date and detailed review of the models and algorithms developed by this field.
- Score: 5.169783325693032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mapping of spatial hotspots, i.e., regions with significantly higher rates or
probability density of generating certain events (e.g., disease or crime
cases), is a important task in diverse societal domains, including public
health, public safety, transportation, agriculture, environmental science, etc.
Clustering techniques required by these domains differ from traditional
clustering methods due to the high economic and social costs of spurious
results (e.g., false alarms of crime clusters). As a result, statistical rigor
is needed explicitly to control the rate of spurious detections. To address
this challenge, techniques for statistically-robust clustering have been
extensively studied by the data mining and statistics communities. In this
survey we present an up-to-date and detailed review of the models and
algorithms developed by this field. We first present a general taxonomy of the
clustering process with statistical rigor, covering key steps of data and
statistical modeling, region enumeration and maximization, significance
testing, and data update. We further discuss different paradigms and methods
within each of key steps. Finally, we highlight research gaps and potential
future directions, which may serve as a stepping stone in generating new ideas
and thoughts in this growing field and beyond.
Related papers
- Active Target Discovery under Uninformative Prior: The Power of Permanent and Transient Memory [26.488250231429774]
In many scientific and engineering fields, where acquiring high-quality data is expensive, strategic sampling of unobserved regions is crucial for maximizing discovery rates within a constrained budget.<n>We propose a novel approach that enables effective active target discovery even in settings with uninformative priors.<n>Unlike black-box policies, our approach is inherently interpretable, providing clear insights into decision-making.
arXiv Detail & Related papers (2025-10-19T00:42:56Z) - Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - Beyond the Norm: A Survey of Synthetic Data Generation for Rare Events [5.619671817895425]
Extreme events, such as market crashes, natural disasters, and pandemics, are rare but catastrophic.<n>While data-driven methods offer powerful capabilities for extreme event modeling, they require abundant training data, yet extreme event data is inherently scarce.<n>This survey provides the first overview of synthetic data generation for extreme events.
arXiv Detail & Related papers (2025-06-04T20:21:23Z) - Out-of-Distribution Detection on Graphs: A Survey [58.47395497985277]
Graph out-of-distribution (GOOD) detection focuses on identifying graph data that deviates from the distribution seen during training.
We categorize existing methods into four types: enhancement-based, reconstruction-based, information propagation-based, and classification-based approaches.
We discuss practical applications and theoretical foundations, highlighting the unique challenges posed by graph data.
arXiv Detail & Related papers (2025-02-12T04:07:12Z) - Online Model-based Anomaly Detection in Multivariate Time Series: Taxonomy, Survey, Research Challenges and Future Directions [0.017476232824732776]
Time-series anomaly detection plays an important role in engineering processes.
This survey introduces a novel taxonomy where a distinction between online and offline, and training and inference is made.
It presents the most popular data sets and evaluation metrics used in the literature, as well as a detailed analysis.
arXiv Detail & Related papers (2024-08-07T13:01:10Z) - A step towards the integration of machine learning and small area
estimation [0.0]
We propose a predictor supported by machine learning algorithms which can be used to predict any population or subpopulation characteristics.
We study only small departures from the assumed model, to show that our proposal is a good alternative in this case as well.
What is more, we propose the method of the accuracy estimation of machine learning predictors, giving the possibility of the accuracy comparison with classic methods.
arXiv Detail & Related papers (2024-02-12T09:43:17Z) - Improving Link Prediction in Social Networks Using Local and Global
Features: A Clustering-based Approach [0.0]
We propose an approach based on the combination of first and second group methods to tackle the link prediction problem.
Our two-phase developed method firstly determines new features related to the position and dynamic behavior of nodes.
Then, a subspace clustering algorithm is applied to group social objects based on the computed similarity measures.
arXiv Detail & Related papers (2023-05-17T14:45:02Z) - Detection and Evaluation of Clusters within Sequential Data [58.720142291102135]
Clustering algorithms for Block Markov Chains possess theoretical optimality guarantees.
In particular, our sequential data is derived from human DNA, written text, animal movement data and financial markets.
It is found that the Block Markov Chain model assumption can indeed produce meaningful insights in exploratory data analyses.
arXiv Detail & Related papers (2022-10-04T15:22:39Z) - Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements.
We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting.
We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Event Prediction in the Big Data Era: A Systematic Survey [7.3810864598379755]
Event prediction is becoming a viable option in the big data era.
This paper aims to provide a systematic and comprehensive survey of the technologies, applications, and evaluations of event prediction.
arXiv Detail & Related papers (2020-07-19T23:24:52Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.