Coresets for Relational Data and The Applications
- URL: http://arxiv.org/abs/2210.04249v1
- Date: Sun, 9 Oct 2022 12:46:27 GMT
- Title: Coresets for Relational Data and The Applications
- Authors: Jiaxiang Chen, Qingyuan Yang, Ruomin Huang and Hu Ding
- Abstract summary: A coreset is a small set that can preserve the structure of the original input data set.
We show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.
- Score: 8.573878018370547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A coreset is a small set that can approximately preserve the structure of the
original input data set. Therefore we can run our algorithm on a coreset so as
to reduce the total computational complexity. Conventional coreset techniques
assume that the input data set is available to process explicitly. However,
this assumption may not hold in real-world scenarios. In this paper, we
consider the problem of coresets construction over relational data. Namely, the
data is decoupled into several relational tables, and it could be very
expensive to directly materialize the data matrix by joining the tables. We
propose a novel approach called ``aggregation tree with pseudo-cube'' that can
build a coreset from bottom to up. Moreover, our approach can neatly circumvent
several troublesome issues of relational learning problems [Khamis et al., PODS
2019]. Under some mild assumptions, we show that our coreset approach can be
applied for the machine learning tasks, such as clustering, logistic regression
and SVM.
Related papers
- Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - Composable Core-sets for Diversity Approximation on Multi-Dataset
Streams [4.765131728094872]
Composable core-sets are core-sets with the property that subsets of the core set can be unioned together to obtain an approximation for the original data.
We introduce a core-set construction algorithm for constructing composable core-sets to summarize streamed data for use in active learning environments.
arXiv Detail & Related papers (2023-08-10T23:24:51Z) - AutoCoreset: An Automatic Practical Coreset Construction Framework [65.37876706107764]
A coreset is a tiny weighted subset of an input set, that closely resembles the loss function.
We propose an automatic framework for constructing coresets, which requires only the input data and the desired cost function from the user.
We show that while this set is limited, the coreset is quite general.
arXiv Detail & Related papers (2023-05-19T19:59:52Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - A Unified Approach to Coreset Learning [24.79658173754555]
Coreset of a given dataset and loss function is usually a small weighed set that approximates this loss for every query from a given set of queries.
We propose a generic, learning-based algorithm for construction of coresets.
arXiv Detail & Related papers (2021-11-04T17:48:05Z) - Robust Coreset for Continuous-and-Bounded Learning (with Outliers) [30.91741925182613]
We propose a novel robust coreset method for the em continuous-and-bounded learning problem (with outliers)
Our robust coreset can be efficiently maintained in fully-dynamic environment.
arXiv Detail & Related papers (2021-06-30T19:24:20Z) - Introduction to Core-sets: an Updated Survey [18.059360820527687]
In machine learning problems, the goal is to minimize or maximize an objective function over some space of candidate solutions.
Traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams.
This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art.
arXiv Detail & Related papers (2020-11-18T16:31:34Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization.
We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.