Online AUC Optimization for Sparse High-Dimensional Datasets
- URL: http://arxiv.org/abs/2009.10867v1
- Date: Wed, 23 Sep 2020 00:50:01 GMT
- Title: Online AUC Optimization for Sparse High-Dimensional Datasets
- Authors: Baojian Zhou, Yiming Ying, Steven Skiena
- Abstract summary: Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification.
Current online AUC optimization algorithms have high per-iteration cost $mathcalO(d)$.
We propose a new algorithm, textscFTRL-AUC, which can process data in an online fashion with a much cheaper per-iteration cost.
- Score: 32.77252579365118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Area Under the ROC Curve (AUC) is a widely used performance measure for
imbalanced classification arising from many application domains where
high-dimensional sparse data is abundant. In such cases, each $d$ dimensional
sample has only $k$ non-zero features with $k \ll d$, and data arrives
sequentially in a streaming form. Current online AUC optimization algorithms
have high per-iteration cost $\mathcal{O}(d)$ and usually produce non-sparse
solutions in general, and hence are not suitable for handling the data
challenge mentioned above.
In this paper, we aim to directly optimize the AUC score for high-dimensional
sparse datasets under online learning setting and propose a new algorithm,
\textsc{FTRL-AUC}. Our proposed algorithm can process data in an online fashion
with a much cheaper per-iteration cost $\mathcal{O}(k)$, making it amenable for
high-dimensional sparse streaming data analysis. Our new algorithmic design
critically depends on a novel reformulation of the U-statistics AUC objective
function as the empirical saddle point reformulation, and the innovative
introduction of the "lazy update" rule so that the per-iteration complexity is
dramatically reduced from $\mathcal{O}(d)$ to $\mathcal{O}(k)$. Furthermore,
\textsc{FTRL-AUC} can inherently capture sparsity more effectively by applying
a generalized Follow-The-Regularized-Leader (FTRL) framework.
Experiments on real-world datasets demonstrate that \textsc{FTRL-AUC}
significantly improves both run time and model sparsity while achieving
competitive AUC scores compared with the state-of-the-art methods. Comparison
with the online learning method for logistic loss demonstrates that
\textsc{FTRL-AUC} achieves higher AUC scores especially when datasets are
imbalanced.
Related papers
- Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations [121.39938773554523]
The Area Under the ROC Curve (AUC) is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints.<n>We present two simple instance-wise minimax reformulations to close the approximation gap of PAUC optimization.<n>The resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(-2/3)$ for typical one-way and two-way PAUCs.
arXiv Detail & Related papers (2025-12-01T02:52:33Z) - Efficiency vs. Fidelity: A Comparative Analysis of Diffusion Probabilistic Models and Flow Matching on Low-Resource Hardware [0.0]
Denoising Diffusion Probabilistic Models (DDPMs) have established a new state-of-the-art in generative image synthesis.<n>This study presents a comparative analysis of DDPMs against the emerging Flow Matching paradigm.
arXiv Detail & Related papers (2025-11-24T18:19:42Z) - Evolution Strategies at the Hyperscale [57.75314521465674]
We introduce EGGROLL, an evolution strategies (ES) algorithm designed to scale backprop-free optimization to large population sizes.<n>ES is a set of powerful blackbox optimisation methods that can handle non-differentiable or noisy objectives.<n>EGGROLL overcomes these bottlenecks by generating random matrices $Ain mathbbRmtimes r, Bin mathbbRntimes r$ with $rll min(m,n)$ to form a low-rank matrix perturbation $A Btop$
arXiv Detail & Related papers (2025-11-20T18:56:05Z) - Online AUC Optimization Based on Second-order Surrogate Loss [3.9269332672496424]
The Area Under the Curve is an important performance metric for classification tasks.<n>We develop a novel second-order surrogate loss based on the pairwise hinge loss, and an efficient online storage.<n>Our approach introduces a new paradigm that directly substitutes the entire aggregated pairwise loss with a loss function constructed from the first- and second-order statistics.
arXiv Detail & Related papers (2025-10-24T07:08:22Z) - DRAUC: An Instance-wise Distributionally Robust AUC Optimization
Framework [133.26230331320963]
Area Under the ROC Curve (AUC) is a widely employed metric in long-tailed classification scenarios.
We propose an instance-wise surrogate loss of Distributionally Robust AUC (DRAUC) and build our optimization framework on top of it.
arXiv Detail & Related papers (2023-11-06T12:15:57Z) - Does it pay to optimize AUC? [17.54773029913898]
We present an efficient algorithm, namely AUC-opt, to find the provably optimal AUC linear classifier in $mathbbR2$.
Experiments show AUC-opt achieves significant improvements on between 17 to 40 in $mathbbR2$ and between 4 to 42 in $mathbbR3$ of 50 t-SNE training datasets.
arXiv Detail & Related papers (2023-06-02T13:28:53Z) - AUC Optimization from Multiple Unlabeled Datasets [14.318887072787938]
We propose U$m$-AUC, an AUC optimization approach that converts the U$m$ data into a multi-label AUC optimization problem.
We show that the proposed U$m$-AUC is effective theoretically and empirically.
arXiv Detail & Related papers (2023-05-25T06:43:42Z) - Provably Efficient Neural Offline Reinforcement Learning via Perturbed
Rewards [33.88533898709351]
VIPeR amalgamates the randomized value function idea with the pessimism principle.
It implicitly obtains pessimism by simply perturbing the offline data multiple times.
It is both provably and computationally efficient in general Markov decision processes (MDPs) with neural network function approximation.
arXiv Detail & Related papers (2023-02-24T17:52:12Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning.
Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system.
The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification.
We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.