Related papers: How to Solve Fair $k$-Center in Massive Data Models

How to Solve Fair $k$-Center in Massive Data Models

URL: http://arxiv.org/abs/2002.07682v2
Date: Mon, 24 Feb 2020 16:55:27 GMT
Title: How to Solve Fair $k$-Center in Massive Data Models
Authors: Ashish Chiplunkar, Sagar Kale, Sivaramakrishnan Natarajan Ramamoorthy
Abstract summary: We design new streaming and distributed algorithms for the fair $k$-center problem. Our main contributions are: (a) the first distributed algorithm; and (b) a two-pass streaming algorithm with a provable approximation guarantee.
Score: 5.3283669037198615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fueled by massive data, important decision making is being automated with the help of algorithms, therefore, fairness in algorithms has become an especially important research topic. In this work, we design new streaming and distributed algorithms for the fair $k$-center problem that models fair data summarization. The streaming and distributed models of computation have an attractive feature of being able to handle massive data sets that do not fit into main memory. Our main contributions are: (a) the first distributed algorithm; which has provably constant approximation ratio and is extremely parallelizable, and (b) a two-pass streaming algorithm with a provable approximation guarantee matching the best known algorithm (which is not a streaming algorithm). Our algorithms have the advantages of being easy to implement in practice, being fast with linear running times, having very small working memory and communication, and outperforming existing algorithms on several real and synthetic data sets. To complement our distributed algorithm, we also give a hardness result for natural distributed algorithms, which holds for even the special case of $k$-center.

Related papers

A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data. We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z)
Learning the hub graphical Lasso model with the structured sparsity via an efficient algorithm [1.0923877073891446]
We introduce a two-phase algorithm to estimate hub graphical models. The proposed algorithm first generates a good initial point via a dual alternating direction method of multipliers. It then warms a semismooth Newton (SSN) based augmented Lagrangian method (ALM) to compute a solution that is accurate enough for practical tasks.
arXiv Detail & Related papers (2023-08-17T08:24:28Z)
ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms [5.478671305092084]
We introduce ParlayANN, a library of deterministic and parallel graph-based approximate nearest neighbor search algorithms. We develop novel parallel implementations for four state-of-the-art graph-based ANNS algorithms that scale to billion-scale datasets.
arXiv Detail & Related papers (2023-05-07T19:28:23Z)
Dual Algorithmic Reasoning [9.701208207491879]
We propose to learn algorithms by exploiting duality of the underlying algorithmic problem. We demonstrate that simultaneously learning the dual definition of these optimisation problems in algorithmic learning allows for better learning. We then validate the real-world utility of our dual algorithmic reasoner by deploying it on a challenging brain vessel classification task.
arXiv Detail & Related papers (2023-02-09T08:46:23Z)
Streaming Algorithms for High-Dimensional Robust Statistics [43.106438224356175]
We develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements. Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model.
arXiv Detail & Related papers (2022-04-26T15:57:07Z)
Practical, Provably-Correct Interactive Learning in the Realizable Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification. We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z)
Learning to Hash Robustly, with Guarantees [79.68057056103014]
In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. Our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.
arXiv Detail & Related papers (2021-08-11T20:21:30Z)
Adversarial Robustness of Streaming Algorithms through Importance Sampling [29.957317489789745]
We introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks. Our results are based on a simple, but powerful, observation that many importance sampling-based algorithms give rise to adversarial robustness. We empirically confirm the robustness of our algorithms on various adversarial attacks and demonstrate that by contrast, some common existing algorithms are not robust.
arXiv Detail & Related papers (2021-06-28T19:24:11Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising [51.97494906131859]
Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc. Existing exact and approximate algorithms usually fail in such settings due to either requiring intolerable running time or too much computation resource. We propose textttNeuSearcher which leverages the knowledge learned from previously instances to solve new problem instances.
arXiv Detail & Related papers (2020-05-09T02:48:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.