Introduction to Core-sets: an Updated Survey
- URL: http://arxiv.org/abs/2011.09384v1
- Date: Wed, 18 Nov 2020 16:31:34 GMT
- Title: Introduction to Core-sets: an Updated Survey
- Authors: Dan Feldman
- Abstract summary: In machine learning problems, the goal is to minimize or maximize an objective function over some space of candidate solutions.
Traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams.
This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art.
- Score: 18.059360820527687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In optimization or machine learning problems we are given a set of items,
usually points in some metric space, and the goal is to minimize or maximize an
objective function over some space of candidate solutions. For example, in
clustering problems, the input is a set of points in some metric space, and a
common goal is to compute a set of centers in some other space (points, lines)
that will minimize the sum of distances to these points. In database queries,
we may need to compute such a some for a specific query set of $k$ centers.
However, traditional algorithms cannot handle modern systems that require
parallel real-time computations of infinite distributed streams from sensors
such as GPS, audio or video that arrive to a cloud, or networks of weaker
devices such as smartphones or robots.
Core-set is a "small data" summarization of the input "big data", where every
possible query has approximately the same answer on both data sets. Generic
techniques enable efficient coreset \changed{maintenance} of streaming,
distributed and dynamic data. Traditional algorithms can then be applied on
these coresets to maintain the approximated optimal solutions.
The challenge is to design coresets with provable tradeoff between their size
and approximation error. This survey summarizes such constructions in a
retrospective way, that aims to unified and simplify the state-of-the-art.
Related papers
- Inferring Neural Signed Distance Functions by Overfitting on Single Noisy Point Clouds through Finetuning Data-Driven based Priors [53.6277160912059]
We propose a method to promote pros of data-driven based and overfitting-based methods for better generalization, faster inference, and higher accuracy in learning neural SDFs.
We introduce a novel statistical reasoning algorithm in local regions which is able to finetune data-driven based priors without signed distance supervision, clean point cloud, or point normals.
arXiv Detail & Related papers (2024-10-25T16:48:44Z) - Fast networked data selection via distributed smoothed quantile estimation [6.002041236376175]
We establish a connection between selecting the most informative data and finding the top-$k$ elements of a multiset.
The top-$k$ selection in a network can be formulated as a distributed nonsmooth convex optimization problem known as quantile estimation.
We characterize the complexity required to achieve top-$k$ selection, a challenging task due to the lack of strong convexity.
arXiv Detail & Related papers (2024-06-04T03:26:15Z) - AutoCoreset: An Automatic Practical Coreset Construction Framework [65.37876706107764]
A coreset is a tiny weighted subset of an input set, that closely resembles the loss function.
We propose an automatic framework for constructing coresets, which requires only the input data and the desired cost function from the user.
We show that while this set is limited, the coreset is quite general.
arXiv Detail & Related papers (2023-05-19T19:59:52Z) - Coresets for Relational Data and The Applications [8.573878018370547]
A coreset is a small set that can preserve the structure of the original input data set.
We show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.
arXiv Detail & Related papers (2022-10-09T12:46:27Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - A Novel Sequential Coreset Method for Gradient Descent Algorithms [21.40879052693993]
Coreset is a popular data compression technique that has been extensively studied before.
We propose a new framework, termed ''sequential coreset'', which effectively avoids the pseudo-dimension and total sensitivity bound.
Our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension.
arXiv Detail & Related papers (2021-12-05T08:12:16Z) - Revisiting Point Cloud Simplification: A Learnable Feature Preserving
Approach [57.67932970472768]
Mesh and Point Cloud simplification methods aim to reduce the complexity of 3D models while retaining visual quality and relevant salient features.
We propose a fast point cloud simplification method by learning to sample salient points.
The proposed method relies on a graph neural network architecture trained to select an arbitrary, user-defined, number of points from the input space and to re-arrange their positions so as to minimize the visual perception error.
arXiv Detail & Related papers (2021-09-30T10:23:55Z) - Communication-efficient distributed eigenspace estimation [31.69089186688224]
We develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix.
Our algorithm uses a novel alignment scheme that minimizes the Procrustean distance between local solutions and a reference solution.
We show that our algorithm achieves a similar error rate to that of a centralized estimator.
arXiv Detail & Related papers (2020-09-05T02:11:22Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.