Robust Coreset for Continuous-and-Bounded Learning (with Outliers)
- URL: http://arxiv.org/abs/2107.00068v1
- Date: Wed, 30 Jun 2021 19:24:20 GMT
- Title: Robust Coreset for Continuous-and-Bounded Learning (with Outliers)
- Authors: Zixiu Wang, Yiwen Guo and Hu Ding
- Abstract summary: We propose a novel robust coreset method for the em continuous-and-bounded learning problem (with outliers)
Our robust coreset can be efficiently maintained in fully-dynamic environment.
- Score: 30.91741925182613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this big data era, we often confront large-scale data in many machine
learning tasks. A common approach for dealing with large-scale data is to build
a small summary, {\em e.g.,} coreset, that can efficiently represent the
original input. However, real-world datasets usually contain outliers and most
existing coreset construction methods are not resilient against outliers (in
particular, the outliers can be located arbitrarily in the space by an
adversarial attacker). In this paper, we propose a novel robust coreset method
for the {\em continuous-and-bounded learning} problem (with outliers) which
includes a broad range of popular optimization objectives in machine learning,
like logistic regression and $ k $-means clustering. Moreover, our robust
coreset can be efficiently maintained in fully-dynamic environment. To the best
of our knowledge, this is the first robust and fully-dynamic coreset
construction method for these optimization problems. We also conduct the
experiments to evaluate the effectiveness of our robust coreset in practice.
Related papers
- Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Composable Core-sets for Diversity Approximation on Multi-Dataset
Streams [4.765131728094872]
Composable core-sets are core-sets with the property that subsets of the core set can be unioned together to obtain an approximation for the original data.
We introduce a core-set construction algorithm for constructing composable core-sets to summarize streamed data for use in active learning environments.
arXiv Detail & Related papers (2023-08-10T23:24:51Z) - Coresets for Relational Data and The Applications [8.573878018370547]
A coreset is a small set that can preserve the structure of the original input data set.
We show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.
arXiv Detail & Related papers (2022-10-09T12:46:27Z) - Adaptive Second Order Coresets for Data-efficient Machine Learning [5.362258158646462]
Training machine learning models on datasets incurs substantial computational costs.
We propose AdaCore to extract subsets of the training examples for efficient machine learning.
arXiv Detail & Related papers (2022-07-28T05:43:09Z) - Can we achieve robustness from data alone? [0.7366405857677227]
Adversarial training and its variants have come to be the prevailing methods to achieve adversarially robust classification using neural networks.
We devise a meta-learning method for robust classification, that optimize the dataset prior to its deployment in a principled way.
Experiments on MNIST and CIFAR-10 demonstrate that the datasets we produce enjoy very high robustness against PGD attacks.
arXiv Detail & Related papers (2022-07-24T12:14:48Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z) - Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization.
We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.