Addressing Distribution Shift in RTB Markets via Exponential Tilting
- URL: http://arxiv.org/abs/2308.07424v1
- Date: Mon, 14 Aug 2023 19:31:58 GMT
- Title: Addressing Distribution Shift in RTB Markets via Exponential Tilting
- Authors: Minji Kim, Seong Jin Lee, Bumsik Kim
- Abstract summary: This paper introduces the Exponential Tilt Reweighting Alignment (ExTRA) algorithm to address distribution shifts in data.
A notable advantage of this method is its ability to operate using labeled source data and unlabeled target data.
Through simulated real-world data, we investigate the nature of distribution shift and evaluate the applicacy of the proposed model.
- Score: 2.883257292731477
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distribution shift in machine learning models can be a primary cause of
performance degradation. This paper delves into the characteristics of these
shifts, primarily motivated by Real-Time Bidding (RTB) market models. We
emphasize the challenges posed by class imbalance and sample selection bias,
both potent instigators of distribution shifts. This paper introduces the
Exponential Tilt Reweighting Alignment (ExTRA) algorithm, as proposed by Marty
et al. (2023), to address distribution shifts in data. The ExTRA method is
designed to determine the importance weights on the source data, aiming to
minimize the KL divergence between the weighted source and target datasets. A
notable advantage of this method is its ability to operate using labeled source
data and unlabeled target data. Through simulated real-world data, we
investigate the nature of distribution shift and evaluate the applicacy of the
proposed model.
Related papers
- On the Interconnections of Calibration, Quantification, and Classifier Accuracy Prediction under Dataset Shift [58.91436551466064]
This paper investigates the interconnections among three fundamental problems, calibration, and quantification, under dataset shift conditions.<n>We show that access to an oracle for any one of these tasks enables the resolution of the other two.<n>We propose new methods for each problem based on direct adaptations of well-established methods borrowed from the other disciplines.
arXiv Detail & Related papers (2025-05-16T15:42:55Z) - A Semi-supervised CART Model for Covariate Shift [0.0]
This paper introduces a semi-supervised classification and regression tree (CART) that uses importance weighting to address distribution discrepancies.
Our method improves the predictive performance of the CART model by assigning greater weights to training samples.
Through simulation studies and applications to real-world medical data, we demonstrate significant improvements in predictive accuracy.
arXiv Detail & Related papers (2024-10-28T12:53:23Z) - Estimating calibration error under label shift without labels [47.57286245320775]
Existing CE estimators assume access to labels from the target domain, which are often unavailable in practice, i.e., when the model is deployed and used.
This work proposes a novel CE estimator under label shift, which is characterized by changes in the marginal label distribution $p(Y)$ while keeping the conditional $p(X|Y)$ constant between the source and target distributions.
Our contribution is an approach, which, by leveraging importance re-weighting of the labeled source distribution, provides consistent and unbiased CE estimation with respect to the shifted target distribution.
arXiv Detail & Related papers (2023-12-14T01:18:51Z) - Aggregation Weighting of Federated Learning via Generalization Bound
Estimation [65.8630966842025]
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions.
We replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
arXiv Detail & Related papers (2023-11-10T08:50:28Z) - Boosted Control Functions: Distribution generalization and invariance in confounded models [10.503777692702952]
We introduce a strong notion of invariance that allows for distribution generalization even in the presence of nonlinear, non-identifiable structural functions.
We propose the ControlTwicing algorithm to estimate the Boosted Control Function (BCF) using flexible machine-learning techniques.
arXiv Detail & Related papers (2023-10-09T15:43:46Z) - Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk
Minimization Framework [12.734559823650887]
In the presence of distribution shifts, fair machine learning models may behave unfairly on test data.
Existing algorithms require full access to data and cannot be used when small batches are used.
This paper proposes the first distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph.
arXiv Detail & Related papers (2023-09-20T23:25:28Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Mandoline: Model Evaluation under Distribution Shift [8.007644303175395]
Machine learning models are often deployed in different settings than they were trained and validated on.
We develop Mandoline, a new evaluation framework that mitigates these issues.
Users write simple "slicing functions" - noisy, potentially correlated binary functions intended to capture possible axes of distribution shift.
arXiv Detail & Related papers (2021-07-01T17:57:57Z) - Robust Generalization despite Distribution Shift via Minimum
Discriminating Information [46.164498176119665]
We introduce a modeling framework where, in addition to training data, we have partial structural knowledge of the shifted test distribution.
We employ the principle of minimum discriminating information to embed the available prior knowledge.
We obtain explicit generalization bounds with respect to the unknown shifted distribution.
arXiv Detail & Related papers (2021-06-08T15:25:35Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Estimating Generalization under Distribution Shifts via Domain-Invariant
Representations [75.74928159249225]
We use a set of domain-invariant predictors as a proxy for the unknown, true target labels.
The error of the resulting risk estimate depends on the target risk of the proxy model.
arXiv Detail & Related papers (2020-07-06T17:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.