Sample-Efficient Clustering and Conquer Procedures for Parallel
Large-Scale Ranking and Selection
- URL: http://arxiv.org/abs/2402.02196v2
- Date: Tue, 13 Feb 2024 02:31:19 GMT
- Title: Sample-Efficient Clustering and Conquer Procedures for Parallel
Large-Scale Ranking and Selection
- Authors: Zishi Zhang, Yijie Peng
- Abstract summary: In parallel computing environments, correlation-based clustering can achieve an $mathcalO(p)$ sample complexity reduction rate.
In large-scale AI applications such as neural architecture search, a screening-free version of our procedure surprisingly surpasses fully-sequential benchmarks in terms of sample efficiency.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose novel "clustering and conquer" procedures for the parallel
large-scale ranking and selection (R&S) problem, which leverage correlation
information for clustering to break the bottleneck of sample efficiency. In
parallel computing environments, correlation-based clustering can achieve an
$\mathcal{O}(p)$ sample complexity reduction rate, which is the optimal
reduction rate theoretically attainable. Our proposed framework is versatile,
allowing for seamless integration of various prevalent R&S methods under both
fixed-budget and fixed-precision paradigms. It can achieve improvements without
the necessity of highly accurate correlation estimation and precise clustering.
In large-scale AI applications such as neural architecture search, a
screening-free version of our procedure surprisingly surpasses fully-sequential
benchmarks in terms of sample efficiency. This suggests that leveraging
valuable structural information, such as correlation, is a viable path to
bypassing the traditional need for screening via pairwise comparison--a step
previously deemed essential for high sample efficiency but problematic for
parallelization. Additionally, we propose a parallel few-shot clustering
algorithm tailored for large-scale problems.
Related papers
- Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering [13.638434337947302]
FSSMSC is a novel solution to the high computational complexity commonly found in existing approaches.
The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks.
The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales.
arXiv Detail & Related papers (2024-08-11T06:54:00Z) - Query-Efficient Correlation Clustering with Noisy Oracle [17.11782578276788]
We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB)
We design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation and study their theoretical guarantees.
Our results are the first examples of clustering-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
arXiv Detail & Related papers (2024-02-02T13:31:24Z) - Sparse Variational Student-t Processes [8.46450148172407]
Student-t Processes are used to model heavy-tailed distributions and datasets with outliers.
We propose a sparse representation framework to allow Student-t Processes to be more flexible for real-world datasets.
We evaluate two proposed approaches on various synthetic and real-world datasets from UCI and Kaggle.
arXiv Detail & Related papers (2023-12-09T12:55:20Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Global Optimization for Cardinality-constrained Minimum Sum-of-Squares
Clustering via Semidefinite Programming [1.3053649021965603]
The minimum sum-of-squares clustering (MSSC) has been recently extended to exploit prior knowledge on the cardinality of each cluster.
We propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC.
For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node.
arXiv Detail & Related papers (2022-09-19T10:19:06Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic
Convolution [136.7261709896713]
We propose a data-driven approach that generates the appropriate convolution kernels to apply in response to the nature of the instances.
The proposed method achieves promising results on both ScanetNetV2 and S3DIS.
It also improves inference speed by more than 25% over the current state-of-the-art.
arXiv Detail & Related papers (2020-11-26T14:56:57Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Optimizing generalization on the train set: a novel gradient-based
framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning.
We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.