Related papers: Learning covariate importance for matching in policy-relevant observational research

Learning covariate importance for matching in policy-relevant observational research

URL: http://arxiv.org/abs/2403.12367v2
Date: Fri, 29 Aug 2025 04:17:06 GMT
Title: Learning covariate importance for matching in policy-relevant observational research
Authors: Hongzhe Zhang, Jiasheng Shi, Jing Huang,
Abstract summary: We propose the Priority-Aware one-to-one Matching Algorithm (PAMA)<n>It is a semi-supervised framework that learns a covariate importance measure from a subset data of units that are paired by experts and uses it to match additional units.<n>It is applied to a real-world study of in-person schooling and COVID-19 transmission.
Score: 2.6361497319422176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Matching methods are widely used to reduce confounding effects in observational studies, but conventional approaches often treat all covariates as equally important, which can result in poor performance when covariates differ in their relevance to the study. We propose the Priority-Aware one-to-one Matching Algorithm (PAMA), a novel semi-supervised framework that learns a covariate importance measure from a subset data of units that are paired by experts and uses it to match additional units. It optimizes a weighted quadratic score that reflects the relevance between each covariate and the study, and iteratively updates the covariate importance measure in the score function using unlabeled data. PAMA is model-free, but we have established that the covariate importance measure -- the learned weights -- is consistent when the oracle matching rule aligns with the design. In addition, we introduce extensions that address imbalanced data, accommodate temporal covariates, and improve robustness to mispaired observations. In simulations, PAMA outperforms standard methods, particularly in high-dimensional settings and under model misspecification. Applied to a real-world study of in-person schooling and COVID-19 transmission, PAMA recovers nearly twice as many expert-designated matches as competing methods using baseline covariates. A self-taught learning extension improves performance in simulations, though its benefit is context-dependent. To our knowledge, PAMA is the first framework to apply semi-supervised learning to observational matching with covariates of unequal relevance. It offers a scalable and interpretable tool for incorporating expert insight into policy-relevant observational research.

Related papers

Enforcing Fundamental Relations via Adversarial Attacks on Input Parameter Correlations [76.2226569692207]
Correlations between input parameters play a crucial role in many scientific classification tasks.<n>We present a new adversarial attack algorithm called Random Distribution Shuffle Attack (RDSA)<n>We demonstrate the RDSA effectiveness on six classification tasks.
arXiv Detail & Related papers (2025-01-09T21:45:09Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Spuriousness-Aware Meta-Learning for Learning Robust Classifiers [26.544938760265136]
Spurious correlations are brittle associations between certain attributes of inputs and target variables. Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold. Mitigating the impact of spurious correlations is crucial towards robust model generalization, but it often requires annotations of the spurious correlations in data.
arXiv Detail & Related papers (2024-06-15T21:41:25Z)
A Novel Ranking Scheme for the Performance Analysis of Stochastic Optimization Algorithms using the Principles of Severity [9.310464457958844]
We provide a novel ranking scheme to rank the algorithms over multiple single-objective optimization problems. The results of the algorithms are compared using a robust bootstrapping-based hypothesis testing procedure.
arXiv Detail & Related papers (2024-05-31T19:35:34Z)
Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation [26.544938760265136]
Deep neural classifiers rely on spurious correlations between spurious attributes of inputs and targets to make predictions. We propose a self-guided spurious correlation mitigation framework. We show that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori.
arXiv Detail & Related papers (2024-05-06T17:12:21Z)
A General Causal Inference Framework for Cross-Sectional Observational Data [0.4972323953932129]
General Causal Inference (GCI) framework specifically designed for cross-sectional observational data. This paper proposes a GCI framework specifically designed for cross-sectional observational data.
arXiv Detail & Related papers (2024-04-28T14:26:27Z)
Trade-off Between Dependence and Complexity for Nonparametric Learning -- an Empirical Process Approach [10.27974860479791]
In many applications where the data exhibit temporal dependencies, the corresponding empirical processes are much less understood. We present a general bound on the expected supremum of empirical processes under standard $beta/rho$-mixing assumptions. We show that even under long-range dependence, it is possible to attain the same rates as in the i.i.d. setting.
arXiv Detail & Related papers (2024-01-17T05:08:37Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z)
On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection [55.73320979733527]
We propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks.
arXiv Detail & Related papers (2023-06-27T02:54:07Z)
Fine-grained Correlation Loss for Regression [20.175415393263037]
We propose to revisit the classic regression tasks with novel investigations on directly optimizing the fine-grained correlation losses. We extensively validate our method on two typical ultrasound image regression tasks, including the image quality assessment and bio-metric measurement.
arXiv Detail & Related papers (2022-07-01T11:25:50Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Deep Probabilistic Graph Matching [72.6690550634166]
We propose a deep learning-based graph matching framework that works for the original QAP without compromising on the matching constraints. The proposed method is evaluated on three popularly tested benchmarks (Pascal VOC, Willow Object and SPair-71k) and it outperforms all previous state-of-the-arts on all benchmarks.
arXiv Detail & Related papers (2022-01-05T13:37:27Z)
ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples. We observe that there exist natural correlations and anti-correlations among human-object interactions. We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank. Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z)
Improved Algorithms for Agnostic Pool-based Active Classification [20.12178157010804]
We consider active learning for binary classification in the agnostic pool-based setting. Our algorithm is superior to state of the art active learning algorithms on image classification datasets.
arXiv Detail & Related papers (2021-05-13T18:24:30Z)
Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples. We observe that there exist natural correlations and anti-correlations among human-object interactions. We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z)
An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings. We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms. We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances. We present a general probabilistic framework that accommodates a variety of aggregate observations. Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
Ranking a set of objects: a graph based least-square approach [70.7866286425868]
We consider the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of equal workers. We propose a class of non-adaptive ranking algorithms that rely on a least-squares intrinsic optimization criterion for the estimation of qualities.
arXiv Detail & Related papers (2020-02-26T16:19:09Z)
Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives [10.291482850329892]
Mixed integer programming (MIP) can be used to solve (to optimality) $ell_0$-regularized regression problems. We propose two classes of scalable algorithms: an exact algorithm that can handlepapprox 50,000$ features in a few minutes, and approximate algorithms that can address instances with $papprox6$. In addition, we present new estimation error bounds for $ell$-regularizeds.
arXiv Detail & Related papers (2020-01-17T18:47:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.