Related papers: Refining CART Models for Covariate Shift with Importance Weight

Related papers

Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
DeCaf: A Causal Decoupling Framework for OOD Generalization on Node Classification [14.96980804513399]
Graph Neural Networks (GNNs) are susceptible to distribution shifts, creating vulnerability and security issues in critical domains. Existing methods that target learning an invariant (feature, structure)-label mapping often depend on oversimplified assumptions about the data generation process. We introduce a more realistic graph data generation model using Structural Causal Models (SCMs) We propose a casual decoupling framework, DeCaf, that independently learns unbiased feature-label and structure-label mappings.
arXiv Detail & Related papers (2024-10-27T00:22:18Z)
Optimizing importance weighting in the presence of sub-population shifts [0.0]
A distribution shift between the training and test data can severely harm performance of machine learning models. We argue that existing weightings for determining the weights are suboptimal, as they neglect the increase of the variance of the estimated model due to the finite sample size of the training data. We propose a bi-level optimization procedure in which the weights and model parameters are optimized simultaneously.
arXiv Detail & Related papers (2024-10-18T09:21:10Z)
Generative Principal Component Regression via Variational Inference [2.4415762506639944]
One approach to designing appropriate manipulations is to target key features of predictive models. We develop a novel objective based on supervised variational autoencoders (SVAEs) that enforces such information is represented in the latent space. We show in simulations that gPCR dramatically improves target selection in manipulation as compared to standard PCR and SVAEs.
arXiv Detail & Related papers (2024-09-03T22:38:55Z)
Quantifying Distribution Shifts and Uncertainties for Enhanced Model Robustness in Machine Learning Applications [0.0]
This study explores model adaptation and generalization by utilizing synthetic data. We employ quantitative measures such as Kullback-Leibler divergence, Jensen-Shannon distance, and Mahalanobis distance to assess data similarity. Our findings suggest that utilizing statistical measures, such as the Mahalanobis distance, to determine whether model predictions fall within the low-error "interpolation regime" or the high-error "extrapolation regime" provides a complementary method for assessing distribution shift and model uncertainty.
arXiv Detail & Related papers (2024-05-03T10:05:31Z)
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs) We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z)
Understanding Calibration of Deep Neural Networks for Medical Image Classification [3.461503547789351]
This study explores model performance and calibration under different training regimes. We consider fully supervised training, as well as rotation-based self-supervised method with and without transfer learning. Our study reveals that factors such as weight distributions and the similarity of learned representations correlate with the calibration trends observed in the models.
arXiv Detail & Related papers (2023-09-22T18:36:07Z)
Characterizing Out-of-Distribution Error via Optimal Transport [15.284665509194134]
Methods of predicting a model's performance on OOD data without labels are important for machine learning safety. We introduce a novel method for estimating model performance by leveraging optimal transport theory. We show that our approaches significantly outperform existing state-of-the-art methods with an up to 3x lower prediction error.
arXiv Detail & Related papers (2023-05-25T01:37:13Z)
Vector-Based Data Improves Left-Right Eye-Tracking Classifier Performance After a Covariate Distributional Shift [0.0]
We propose a fine-grain data approach for EEG-ET data collection in order to create more robust benchmarking. We train machine learning models utilizing both coarse-grain and fine-grain data and compare their accuracies when tested on data of similar/different distributional patterns. Results showed that models trained on fine-grain, vector-based data were less susceptible to distributional shifts than models trained on coarse-grain, binary-classified data.
arXiv Detail & Related papers (2022-07-31T16:27:50Z)
Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data [89.79617468457393]
Training models with imbalance rate (class density discrepancy) may lead to suboptimal prediction. We propose a framework for training models for this imbalance issue. We demonstrate our model's improved performance in real-world medical datasets.
arXiv Detail & Related papers (2022-07-23T00:39:53Z)
Undersmoothing Causal Estimators with Generative Trees [0.0]
Inferring individualised treatment effects from observational data can unlock the potential for targeted interventions. It is, however, hard to infer these effects from observational data. In this paper, we explore a novel generative tree based approach that tackles model misspecification directly.
arXiv Detail & Related papers (2022-03-16T11:59:38Z)
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set where the sample weights are computed. We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Causal Effect Variational Autoencoder with Uniform Treatment [50.895390968371665]
Causal effect variational autoencoder (CEVAE) are trained to predict the outcome given observational treatment data. Uniform treatment variational autoencoders (UTVAE) are trained with uniform treatment distribution using importance sampling.
arXiv Detail & Related papers (2021-11-16T17:40:57Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions. We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts. We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z)
Accurate and Robust Feature Importance Estimation under Distribution Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method. We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.