Related papers: A Unifying Framework for Robust and Efficient Inference with Unstructured Data

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

URL: http://arxiv.org/abs/2505.00282v1
Date: Thu, 01 May 2025 04:11:25 GMT
Title: A Unifying Framework for Robust and Efficient Inference with Unstructured Data
Authors: Jacob Carlson, Melissa Dell,
Abstract summary: This paper presents a general framework for conducting efficient and robust inference on parameters derived from unstructured data.<n>We formalize this approach with MARS (Missing At Random Structured Data), a unifying framework that integrates and extends existing methods for debiased inference.<n>We develop robust and efficient estimators for both descriptive and causal estimands and address challenges such as inference using aggregated and transformed predictions from unstructured data.
Score: 2.07180164747172
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a general framework for conducting efficient and robust inference on parameters derived from unstructured data, which include text, images, audio, and video. Economists have long incorporated data extracted from texts and images into their analyses, a practice that has accelerated with advancements in deep neural networks. However, neural networks do not generically produce unbiased predictions, potentially propagating bias to estimators that use their outputs. To address this challenge, we reframe inference with unstructured data as a missing structured data problem, where structured data are imputed from unstructured inputs using deep neural networks. This perspective allows us to apply classic results from semiparametric inference, yielding valid, efficient, and robust estimators based on unstructured data. We formalize this approach with MARS (Missing At Random Structured Data), a unifying framework that integrates and extends existing methods for debiased inference using machine learning predictions, linking them to a variety of older, familiar problems such as causal inference. We develop robust and efficient estimators for both descriptive and causal estimands and address challenges such as inference using aggregated and transformed predictions from unstructured data. Importantly, MARS applies to common empirical settings that have received limited attention in the existing literature. Finally, we reanalyze prominent studies that use unstructured data, demonstrating the practical value of MARS.

Related papers

Personalized Treatment Effect Estimation from Unstructured Data [8.468367158186007]
We introduce an approximate 'plug-in' method trained directly on the neural representations of unstructured data.<n>We then introduce two theoretically grounded estimators that leverage structured measurements of the confounders during training.<n>Our experiments on two benchmark datasets show that the plug-in method, directly trainable on large unstructured datasets, achieves strong empirical performance across all settings.
arXiv Detail & Related papers (2025-07-28T16:52:31Z)
Financial Data Analysis with Robust Federated Logistic Regression [7.68275287892947]
In this study, we focus on the analysis of financial data in a federated setting, wherein data is distributed across multiple clients or locations.<n>We propose a robust federated logistic regression-based framework that strives to strike a balance between these goals.
arXiv Detail & Related papers (2025-04-28T20:42:24Z)
Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution.<n>Our contribution is to provide the first general estimation technique for transportability problems.<n>We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
Meta-Statistical Learning: Supervised Learning of Statistical Inference [59.463430294611626]
This work demonstrates that the tools and principles driving the success of large language models (LLMs) can be repurposed to tackle distribution-level tasks.<n>We propose meta-statistical learning, a framework inspired by multi-instance learning that reformulates statistical inference tasks as supervised learning problems.
arXiv Detail & Related papers (2025-02-17T18:04:39Z)
Towards the generation of hierarchical attack models from cybersecurity vulnerabilities using language models [3.7548609506798494]
This paper investigates the use of a pre-trained language model and siamese network to discern sibling relationships between text-based cybersecurity vulnerability data.
arXiv Detail & Related papers (2024-10-07T13:05:33Z)
Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL) It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights. The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z)
Graph Structure Learning with Interpretable Bayesian Neural Networks [10.957528713294874]
We introduce novel iterations with independently interpretable parameters. These parameters influence characteristics of the estimated graph, such as edge sparsity. After unrolling these iterations, prior knowledge over such graph characteristics shape prior distributions. Fast execution and parameter efficiency allow for high-fidelity posterior approximation.
arXiv Detail & Related papers (2024-06-20T23:27:41Z)
Implicit Generative Prior for Bayesian Neural Networks [8.013264410621357]
We propose a novel neural adaptive empirical Bayes (NA-EB) framework for complex data structures. The proposed NA-EB framework combines variational inference with a gradient ascent algorithm. We demonstrate the practical applications of our framework through extensive evaluations on a variety of tasks.
arXiv Detail & Related papers (2024-04-27T21:00:38Z)
REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training [49.581884130880944]
Deep neural network (DNN) has been proven effective in various domains. However, they often struggle to perform well on certain minority groups during inference.
arXiv Detail & Related papers (2023-12-05T16:27:54Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
Boosting Event Extraction with Denoised Structure-to-Text Augmentation [52.21703002404442]
Event extraction aims to recognize pre-defined event triggers and arguments from texts. Recent data augmentation methods often neglect the problem of grammatical incorrectness. We propose a denoised structure-to-text augmentation framework for event extraction DAEE.
arXiv Detail & Related papers (2023-05-16T16:52:07Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Multi-Modal Causal Inference with Deep Structural Equation Models [3.5271614282612314]
We develop techniques that leverage unstructured data within causal inference to correct for confounders that may otherwise not be accounted for. We empirically demonstrate on tasks in genomics and healthcare that unstructured data can be used to correct for diverse sources of confounding.
arXiv Detail & Related papers (2022-03-18T00:44:36Z)
Learning Output Embeddings in Structured Prediction [73.99064151691597]
A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension. A prediction in the original space is computed by solving a pre-image problem. In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space.
arXiv Detail & Related papers (2020-07-29T09:32:53Z)
Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms [0.9649642656207869]
This paper reviews past approaches to the use of deep-learning frameworks for the analysis of irregular-patterned datasets. Traditional deep-learning methods perform poorly or even fail when trying to analyse these datasets. The performance of deep-learning frameworks was found to be evaluated mainly using mean absolute error and root mean square error accuracy metrics.
arXiv Detail & Related papers (2020-07-22T17:53:00Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.