Data Fusion for High-Resolution Estimation
- URL: http://arxiv.org/abs/2508.14858v1
- Date: Wed, 20 Aug 2025 17:12:26 GMT
- Title: Data Fusion for High-Resolution Estimation
- Authors: Amy Guan, Marissa Reitsma, Roshni Sahoo, Joshua Salomon, Stefan Wager,
- Abstract summary: High-resolution estimates of population health indicators are critical for precision public health.<n>We propose a method for high-resolution estimation that fuses distinct data sources.
- Score: 2.256930639079776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g. aggregated administrative data) and a potentially biased, high-resolution data source (e.g. individual-level online survey responses). We assume that the potentially biased, high-resolution data source is generated from the population under a model of sampling bias where observables can have arbitrary impact on the probability of response but the difference in the log probabilities of response between units with the same observables is linear in the difference between sufficient statistics of their observables and outcomes. Our data fusion method learns a distribution that is closest (in the sense of KL divergence) to the online survey distribution and consistent with the aggregated administrative data and our model of sampling bias. This method outperforms baselines that rely on either data source alone on a testbed that includes repeated measurements of three indicators measured by both the (online) Household Pulse Survey and ground-truth data sources at two geographic resolutions over the same time period.
Related papers
- Conformal Prediction for Multi-Source Detection on a Network [59.17729745907474]
We study the multi-source detection problem.<n>Given snapshot observations of node infection status on a graph, estimate the set of source nodes that initiated the propagation.<n>We propose a novel conformal prediction framework that provides statistically valid recall guarantees for source set detection.
arXiv Detail & Related papers (2025-11-12T01:09:56Z) - A Data-Driven Prism: Multi-View Source Separation with Diffusion Model Priors [0.0]
We show that diffusion models can solve the source separation problem without explicit assumptions about the source.<n>Our method succeeds even when no source is individually observed and the observations are noisy, incomplete, and vary in resolution.
arXiv Detail & Related papers (2025-10-06T18:00:05Z) - Source-Free Domain-Invariant Performance Prediction [68.39031800809553]
We propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data.
Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability.
Our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation.
arXiv Detail & Related papers (2024-08-05T03:18:58Z) - Looking at Model Debiasing through the Lens of Anomaly Detection [11.113718994341733]
Deep neural networks are sensitive to bias in the data.<n>In this work, we show the importance of accurately predicting the bias-conflicting and bias-aligned samples.<n>We propose a new bias identification method based on anomaly detection.
arXiv Detail & Related papers (2024-07-24T17:30:21Z) - Sourcerer: Sample-based Maximum Entropy Source Distribution Estimation [5.673617376471343]
We propose an approach which targets the maximum entropy distribution, i.e., prioritizes retaining as much uncertainty as possible.<n>Our method is purely sample-based - leveraging the Sliced-Wasserstein distance to measure the discrepancy between the dataset and simulations.<n>To demonstrate the utility of our approach, we infer source distributions for parameters of the Hodgkin-Huxley model from experimental datasets with thousands of single-neuron measurements.
arXiv Detail & Related papers (2024-02-12T17:13:02Z) - Interpretable Causal Inference for Analyzing Wearable, Sensor, and Distributional Data [62.56890808004615]
We develop an interpretable method for distributional data analysis that ensures trustworthy and robust decision-making.
We demonstrate ADD MALTS' utility by studying the effectiveness of continuous glucose monitors in mitigating diabetes risks.
arXiv Detail & Related papers (2023-12-17T00:42:42Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Two-Stage Robust and Sparse Distributed Statistical Inference for
Large-Scale Data [18.34490939288318]
We address the problem of conducting statistical inference in settings involving large-scale data that may be high-dimensional and contaminated by outliers.
We propose a two-stage distributed and robust statistical inference procedures coping with high-dimensional models by promoting sparsity.
arXiv Detail & Related papers (2022-08-17T11:17:47Z) - Exploring the Intrinsic Probability Distribution for Hyperspectral
Anomaly Detection [9.653976364051564]
We propose a novel probability distribution representation detector (PDRD) that explores the intrinsic distribution of both the background and the anomalies in original data for hyperspectral anomaly detection.
We conduct the experiments on four real data sets to evaluate the performance of our proposed method.
arXiv Detail & Related papers (2021-05-14T11:42:09Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.