Simple Weak Coresets for Non-Decomposable Classification Measures
- URL: http://arxiv.org/abs/2312.09885v1
- Date: Fri, 15 Dec 2023 15:32:25 GMT
- Title: Simple Weak Coresets for Non-Decomposable Classification Measures
- Authors: Jayesh Malaviya, Anirban Dasgupta and Rachit Chhaya
- Abstract summary: We show that uniform sampling based coresets have excellent empirical performance backed by theoretical guarantees too.
We focus on the F1 score and Matthews Correlation Coefficient, two widely used non-decomposable objective functions that are nontrivial to optimize for and show that uniform coresets attain a lower bound for coreset size.
- Score: 3.5819148482955514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While coresets have been growing in terms of their application, barring few
exceptions, they have mostly been limited to unsupervised settings. We consider
supervised classification problems, and non-decomposable evaluation measures in
such settings. We show that stratified uniform sampling based coresets have
excellent empirical performance that are backed by theoretical guarantees too.
We focus on the F1 score and Matthews Correlation Coefficient, two widely used
non-decomposable objective functions that are nontrivial to optimize for and
show that uniform coresets attain a lower bound for coreset size, and have good
empirical performance, comparable with ``smarter'' coreset construction
strategies.
Related papers
- Refined Coreset Selection: Towards Minimal Coreset Size under Model
Performance Constraints [69.27190330994635]
Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms.
We propose an innovative method, which maintains optimization priority order over the model performance and coreset size.
Empirically, extensive experiments confirm its superiority, often yielding better model performance with smaller coreset sizes.
arXiv Detail & Related papers (2023-11-15T03:43:04Z) - Coverage-centric Coreset Selection for High Pruning Rates [11.18635356469467]
One-shot coreset selection aims to select a subset of the training data, given a pruning rate, that can achieve high accuracy for models that are subsequently trained only with that subset.
State-of-the-art coreset selection methods typically assign an importance score to each example and select the most important examples to form a coreset.
But at high pruning rates, they have been found to suffer a catastrophic accuracy drop, performing worse than even random coreset selection.
arXiv Detail & Related papers (2022-10-28T00:14:00Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - An Empirical Evaluation of $k$-Means Coresets [4.45709593827781]
There is no work on comparing the quality of available $k$-means coresets.
We propose a benchmark for which we argue that computing coresets is challenging.
We conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.
arXiv Detail & Related papers (2022-07-03T06:47:53Z) - A Unified Approach to Coreset Learning [24.79658173754555]
Coreset of a given dataset and loss function is usually a small weighed set that approximates this loss for every query from a given set of queries.
We propose a generic, learning-based algorithm for construction of coresets.
arXiv Detail & Related papers (2021-11-04T17:48:05Z) - Bend-Net: Bending Loss Regularized Multitask Learning Network for Nuclei
Segmentation in Histopathology Images [65.47507533905188]
We propose a novel multitask learning network with a bending loss regularizer to separate overlapped nuclei accurately.
The newly proposed multitask learning architecture enhances the generalization by learning shared representation from three tasks.
The proposed bending loss defines high penalties to concave contour points with large curvatures, and applies small penalties to convex contour points with small curvatures.
arXiv Detail & Related papers (2021-09-30T17:29:44Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - A Statistical Perspective on Coreset Density Estimation [26.023056426554415]
Coresets have emerged as a powerful tool to summarize data by selecting a small subset of the original observations.
We show that the practical coreset kernel density estimators are near-minimax optimal over a large class of H"older-smooth densities.
arXiv Detail & Related papers (2020-11-10T05:18:43Z) - Prior Guided Feature Enrichment Network for Few-Shot Segmentation [64.91560451900125]
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results.
Few-shot segmentation is proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples.
Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information.
arXiv Detail & Related papers (2020-08-04T10:41:32Z) - Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization.
We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z) - Unifying Few- and Zero-Shot Egocentric Action Recognition [3.1368611610608848]
We propose a new set of splits derived from the EPIC-KITCHENS dataset that allow evaluation of open-set classification.
We show that adding a metric-learning loss to the conventional direct-alignment baseline can improve zero-shot classification by as much as 10%.
arXiv Detail & Related papers (2020-05-27T02:23:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.