Related papers: Sequential Community Mode Estimation

Sequential Community Mode Estimation

URL: http://arxiv.org/abs/2111.08535v1
Date: Tue, 16 Nov 2021 15:05:40 GMT
Title: Sequential Community Mode Estimation
Authors: Shubham Anand Jain, Shreyas Goenka, Divyam Bapna, Nikhil Karamchandani, Jayakrishnan Nair
Abstract summary: We study the problem of identifying the largest community within the population via sequential, random sampling of individuals. We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm.
Score: 7.693649834261879
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider a population, partitioned into a set of communities, and study the problem of identifying the largest community within the population via sequential, random sampling of individuals. There are multiple sampling domains, referred to as \emph{boxes}, which also partition the population. Each box may consist of individuals of different communities, and each community may in turn be spread across multiple boxes. The learning agent can, at any time, sample (with replacement) a random individual from any chosen box; when this is done, the agent learns the community the sampled individual belongs to, and also whether or not this individual has been sampled before. The goal of the agent is to minimize the probability of mis-identifying the largest community in a \emph{fixed budget} setting, by optimizing both the sampling strategy as well as the decision rule. We propose and analyse novel algorithms for this problem, and also establish information theoretic lower bounds on the probability of error under any algorithm. In several cases of interest, the exponential decay rates of the probability of error under our algorithms are shown to be optimal up to constant factors. The proposed algorithms are further validated via simulations on real-world datasets.

Related papers

Random is Faster than Systematic in Multi-Objective Local Search [8.143459364469708]
In multi-objective local search algorithms, a common practice is to maintain an archive of all non-dominated solutions found so far.<n>A central issue in this process is how to explore the neighbourhood of a selected solution.<n>We show that the random sampling method is consistently faster than the systematic exploration method across a range of multi-objective problems.
arXiv Detail & Related papers (2026-01-09T21:27:30Z)
Scalable Decentralized Algorithms for Online Personalized Mean Estimation [12.002609934938224]
This study focuses on a simplified version of the overarching problem, where each agent collects samples from a real-valued distribution over time to estimate its mean. We introduce two collaborative mean estimation algorithms: one draws inspiration from belief propagation, while the other employs a consensus-based approach.
arXiv Detail & Related papers (2024-02-20T08:30:46Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Bivariate Estimation-of-Distribution Algorithms Can Find an Exponential Number of Optima [12.009357100208353]
We propose the test function EqualBlocksOneMax (EBOM) to support the study of how optimization algorithms handle large sets of optima. We show that EBOM behaves very similarly to a theoretically ideal model for EBOM, which samples each of the exponentially many optima with the same maximal probability.
arXiv Detail & Related papers (2023-10-06T06:32:07Z)
Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment. Standard sampling-based algorithms can be used to search for solutions in the robots' joint space. We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z)
Is it easier to count communities than find them? [82.90505487525533]
We consider certain hypothesis testing problems between models with different community structures. We show that testing between two options is as hard as finding the communities.
arXiv Detail & Related papers (2022-12-21T09:35:19Z)
BRIO: Bringing Order to Abstractive Summarization [107.97378285293507]
We propose a novel training paradigm which assumes a non-deterministic distribution. Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
arXiv Detail & Related papers (2022-03-31T05:19:38Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Frequent Itemset-driven Search for Finding Minimum Node Separators in Complex Networks [61.2383572324176]
We propose a frequent itemset-driven search approach, which integrates the concept of frequent itemset mining in data mining into the well-known memetic search framework. It iteratively employs the frequent itemset recombination operator to generate promising offspring solution based on itemsets that frequently occur in high-quality solutions. In particular, it discovers 29 new upper bounds and matches 18 previous best-known bounds.
arXiv Detail & Related papers (2022-01-18T11:16:40Z)
Population based change-point detection for the identification of homozygosity islands [0.0]
We introduce a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast greedy binary splitting algorithm. We prove both algorithms converge almost surely to the set of change-points under very general assumptions on the distribution and independent sampling of the random vector. This new approach is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population.
arXiv Detail & Related papers (2021-11-19T12:53:41Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.