An active learning framework for multi-group mean estimation
- URL: http://arxiv.org/abs/2505.14882v1
- Date: Tue, 20 May 2025 20:13:04 GMT
- Title: An active learning framework for multi-group mean estimation
- Authors: Abdellah Aznag, Rachel Cummings, Adam N. Elmachtoub,
- Abstract summary: We study a fundamental learning problem over multiple groups with unknown data distributions.<n>We propose an algorithm, Variance-UCB, that sequentially selects groups according to an upper confidence bound on the variance estimate.
- Score: 11.799152724436999
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a fundamental learning problem over multiple groups with unknown data distributions, where an analyst would like to learn the mean of each group. Moreover, we want to ensure that this data is collected in a relatively fair manner such that the noise of the estimate of each group is reasonable. In particular, we focus on settings where data are collected dynamically, which is important in adaptive experimentation for online platforms or adaptive clinical trials for healthcare. In our model, we employ an active learning framework to sequentially collect samples with bandit feedback, observing a sample in each period from the chosen group. After observing a sample, the analyst updates their estimate of the mean and variance of that group and chooses the next group accordingly. The analyst's objective is to dynamically collect samples to minimize the collective noise of the estimators, measured by the norm of the vector of variances of the mean estimators. We propose an algorithm, Variance-UCB, that sequentially selects groups according to an upper confidence bound on the variance estimate. We provide a general theoretical framework for providing efficient bounds on learning from any underlying distribution where the variances can be estimated reasonably. This framework yields upper bounds on regret that improve significantly upon all existing bounds, as well as a collection of new results for different objectives and distributions than those previously studied.
Related papers
- On the Interconnections of Calibration, Quantification, and Classifier Accuracy Prediction under Dataset Shift [58.91436551466064]
This paper investigates the interconnections among three fundamental problems, calibration, and quantification, under dataset shift conditions.<n>We show that access to an oracle for any one of these tasks enables the resolution of the other two.<n>We propose new methods for each problem based on direct adaptations of well-established methods borrowed from the other disciplines.
arXiv Detail & Related papers (2025-05-16T15:42:55Z) - Regularized Neural Ensemblers [55.15643209328513]
In this study, we explore employing regularized neural networks as ensemble methods.<n>Motivated by the risk of learning low-diversity ensembles, we propose regularizing the ensembling model by randomly dropping base model predictions.<n>We demonstrate this approach provides lower bounds for the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Multi-Layer Personalized Federated Learning for Mitigating Biases in Student Predictive Analytics [8.642174401125263]
We propose a Multi-Layer Personalized Federated Learning (MLPFL) methodology to optimize inference accuracy over different layers of student grouping criteria.
In our approach, personalized models for individual student subgroups are derived from a global model.
Experiments on three real-world online course datasets show significant improvements achieved by our approach over existing student modeling benchmarks.
arXiv Detail & Related papers (2022-12-05T17:27:28Z) - On Learning Fairness and Accuracy on Multiple Subgroups [9.789933013990966]
We present a principled method for learning a fair predictor for all subgroups via formulating it as a bilevel objective.
Specifically, the subgroup specific predictors are learned in the lower-level through a small amount of data and the fair predictor.
In the upper-level, the fair predictor is updated to be close to all subgroup specific predictors.
arXiv Detail & Related papers (2022-10-19T18:59:56Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Group selection and shrinkage: Structured sparsity for semiparametric
additive models [0.0]
Sparse regression and classification estimators that respect group structures have application to an assortment of statistical and machine learning problems.
We develop a framework for fitting the nonparametric surface and present finite error models.
We demonstrate their efficacy in modeling foot and economic predictors using many predictors.
arXiv Detail & Related papers (2021-05-25T17:00:25Z) - Representation Matters: Assessing the Importance of Subgroup Allocations
in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives.
Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Robust Grouped Variable Selection Using Distributionally Robust
Optimization [11.383869751239166]
We propose a Distributionally Robust Optimization (DRO) formulation with a Wasserstein-based uncertainty set for selecting grouped variables under perturbations.
We prove probabilistic bounds on the out-of-sample loss and the estimation bias, and establish the grouping effect of our estimator.
We show that our formulation produces an interpretable and parsimonious model that encourages sparsity at a group level.
arXiv Detail & Related papers (2020-06-10T22:32:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.