How to Solve Fair $k$-Center in Massive Data Models
- URL: http://arxiv.org/abs/2002.07682v2
- Date: Mon, 24 Feb 2020 16:55:27 GMT
- Title: How to Solve Fair $k$-Center in Massive Data Models
- Authors: Ashish Chiplunkar, Sagar Kale, Sivaramakrishnan Natarajan Ramamoorthy
- Abstract summary: We design new streaming and distributed algorithms for the fair $k$-center problem.
Our main contributions are: (a) the first distributed algorithm; and (b) a two-pass streaming algorithm with a provable approximation guarantee.
- Score: 5.3283669037198615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fueled by massive data, important decision making is being automated with the
help of algorithms, therefore, fairness in algorithms has become an especially
important research topic. In this work, we design new streaming and distributed
algorithms for the fair $k$-center problem that models fair data summarization.
The streaming and distributed models of computation have an attractive feature
of being able to handle massive data sets that do not fit into main memory. Our
main contributions are: (a) the first distributed algorithm; which has provably
constant approximation ratio and is extremely parallelizable, and (b) a
two-pass streaming algorithm with a provable approximation guarantee matching
the best known algorithm (which is not a streaming algorithm). Our algorithms
have the advantages of being easy to implement in practice, being fast with
linear running times, having very small working memory and communication, and
outperforming existing algorithms on several real and synthetic data sets. To
complement our distributed algorithm, we also give a hardness result for
natural distributed algorithms, which holds for even the special case of
$k$-center.
Related papers
- A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Learning the hub graphical Lasso model with the structured sparsity via
an efficient algorithm [1.0923877073891446]
We introduce a two-phase algorithm to estimate hub graphical models.
The proposed algorithm first generates a good initial point via a dual alternating direction method of multipliers.
It then warms a semismooth Newton (SSN) based augmented Lagrangian method (ALM) to compute a solution that is accurate enough for practical tasks.
arXiv Detail & Related papers (2023-08-17T08:24:28Z) - ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate
Nearest Neighbor Search Algorithms [5.478671305092084]
We introduce ParlayANN, a library of deterministic and parallel graph-based approximate nearest neighbor search algorithms.
We develop novel parallel implementations for four state-of-the-art graph-based ANNS algorithms that scale to billion-scale datasets.
arXiv Detail & Related papers (2023-05-07T19:28:23Z) - Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem.
We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning.
We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z) - Streaming Algorithms for High-Dimensional Robust Statistics [43.106438224356175]
We develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements.
Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model.
arXiv Detail & Related papers (2022-04-26T15:57:07Z) - Practical, Provably-Correct Interactive Learning in the Realizable
Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification.
We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z) - Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms.
We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically.
Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - Learning to Accelerate Heuristic Searching for Large-Scale Maximum
Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.
Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource.
We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.