Semi-Supervised U-statistics
- URL: http://arxiv.org/abs/2402.18921v2
- Date: Sat, 9 Mar 2024 07:16:46 GMT
- Title: Semi-Supervised U-statistics
- Authors: Ilmun Kim, Larry Wasserman, Sivaraman Balakrishnan, Matey Neykov
- Abstract summary: We introduce semi-supervised U-statistics enhanced by the abundance of unlabeled data.
We show that the proposed approach exhibits notable efficiency gains over classical U-statistics.
We propose a refined approach that outperforms the classical U-statistic across all degeneracy regimes.
- Score: 22.696630428733204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semi-supervised datasets are ubiquitous across diverse domains where
obtaining fully labeled data is costly or time-consuming. The prevalence of
such datasets has consistently driven the demand for new tools and methods that
exploit the potential of unlabeled data. Responding to this demand, we
introduce semi-supervised U-statistics enhanced by the abundance of unlabeled
data, and investigate their statistical properties. We show that the proposed
approach is asymptotically Normal and exhibits notable efficiency gains over
classical U-statistics by effectively integrating various powerful prediction
tools into the framework. To understand the fundamental difficulty of the
problem, we derive minimax lower bounds in semi-supervised settings and
showcase that our procedure is semi-parametrically efficient under regularity
conditions. Moreover, tailored to bivariate kernels, we propose a refined
approach that outperforms the classical U-statistic across all degeneracy
regimes, and demonstrate its optimality properties. Simulation studies are
conducted to corroborate our findings and to further demonstrate our framework.
Related papers
- Testing Generalizability in Causal Inference [3.547529079746247]
There is no formal procedure for statistically evaluating generalizability in machine learning algorithms.
We propose a systematic and quantitative framework for evaluating model generalizability in causal inference settings.
By basing simulations on real data, our method ensures more realistic evaluations, which is often missing in current work.
arXiv Detail & Related papers (2024-11-05T11:44:00Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion.
We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences.
For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z) - A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams [0.0]
This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression.
The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism.
To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE)
arXiv Detail & Related papers (2023-12-12T19:23:54Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium
Learning from Offline Datasets [101.5329678997916]
We study episodic two-player zero-sum Markov games (MGs) in the offline setting.
The goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori.
arXiv Detail & Related papers (2022-02-15T15:39:30Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - Training Deep Normalizing Flow Models in Highly Incomplete Data
Scenarios with Prior Regularization [13.985534521589257]
We propose a novel framework to facilitate the learning of data distributions in high paucity scenarios.
The proposed framework naturally stems from posing the process of learning from incomplete data as a joint optimization task.
arXiv Detail & Related papers (2021-04-03T20:57:57Z) - Incremental Semi-Supervised Learning Through Optimal Transport [0.0]
We propose a novel approach for the transductive semi-supervised learning, using a complete bipartite edge-weighted graph.
The proposed approach uses the regularized optimal transport between empirical measures defined on labelled and unlabelled data points in order to obtain an affinity matrix from the optimal transport plan.
arXiv Detail & Related papers (2021-03-22T15:31:53Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.