Related papers: Mini-batch Submodular Maximization

Mini-batch Submodular Maximization

URL: http://arxiv.org/abs/2401.12478v2
Date: Wed, 02 Oct 2024 09:02:19 GMT
Title: Mini-batch Submodular Maximization
Authors: Gregory Schwartzman,
Abstract summary: We present the first mini-batch algorithm for maximizing a monotone decomposable submodular function, $F=sum_i=1N fi$, under a set of constraints. We consider two sampling approaches: uniform and weighted. Surprisingly, our experimental results show that uniform sampling is superior to weighted sampling.
Score: 5.439020425819001
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present the first mini-batch algorithm for maximizing a non-negative monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of constraints. We consider two sampling approaches: uniform and weighted. We first show that mini-batch with weighted sampling improves over the state of the art sparsifier based approach both in theory and in practice. Surprisingly, our experimental results show that uniform sampling is superior to weighted sampling. However, it is impossible to explain this using worst-case analysis. Our main contribution is using smoothed analysis to provide a theoretical foundation for our experimental results. We show that, under very mild assumptions, uniform sampling is superior for both the mini-batch and the sparsifier approaches. We empirically verify that these assumptions hold for our datasets. Uniform sampling is simple to implement and has complexity independent of $N$, making it the perfect candidate to tackle massive real-world datasets.

Related papers

Constrained and Composite Sampling via Proximal Sampler [2.087898608419977]
We study two log-concave sampling problems: constrained sampling and composite sampling.<n>The main challenge is enforcing feasibility without degrading mixing.<n>In composite sampling, the target is proportional to $exp(-f(x)-h(x))$ with closed and convex $f$ and $h$.
arXiv Detail & Related papers (2026-02-16T05:36:36Z)
Wedge Sampling: Efficient Tensor Completion with Nearly-Linear Sample Complexity [9.42598427201735]
We introduce Wedge Sampling, a new non-adaptive sampling scheme for low-rank tensor completion.<n>We study recovery of an order-$k low-rank tensor of dimension $n times cdots times n$ from a subset of its entries.
arXiv Detail & Related papers (2026-02-05T16:47:13Z)
Nearly Optimal Sample Complexity for Learning with Label Proportions [54.67830198790247]
We investigate Learning from Label Proportions (LLP), a partial information setting where examples in a training set are grouped into bags.<n>Despite the partial observability, the goal is still to achieve small regret at the level of individual examples.<n>We give results on the sample complexity of LLP under square loss, showing that our sample complexity is essentially optimal.
arXiv Detail & Related papers (2025-05-08T15:45:23Z)
Differentially Private Multi-Sampling from Distributions [4.292685318253575]
We study the sample complexity of DP emphsingle-sampling i.e., the minimum number of samples needed to perform this task. We define two variants of emphmulti-sampling, where the goal is to privately approximate $m>1$ samples.
arXiv Detail & Related papers (2024-12-13T19:14:05Z)
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel [10.840582511203024]
We show that our algorithm can be parallelized to run in only $widetilde O(log2 d)$ parallel rounds. We also show that our algorithm can be parallelized to run in only $widetilde O(log2 d)$ parallel rounds.
arXiv Detail & Related papers (2024-06-03T01:34:34Z)
Simple and effective data augmentation for compositional generalization [64.00420578048855]
We show that data augmentation methods that sample MRs and backtranslate them can be effective for compositional generalization. Remarkably, sampling from a uniform distribution performs almost as well as sampling from the test distribution.
arXiv Detail & Related papers (2024-01-18T09:13:59Z)
Improved Active Learning via Dependent Leverage Score Sampling [8.400581768343804]
We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting. We propose an easily implemented method based on the emphpivotal sampling algorithm In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to $50%$.
arXiv Detail & Related papers (2023-10-08T01:51:30Z)
On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms. We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z)
Rethinking Collaborative Metric Learning: Toward an Efficient Alternative without Negative Sampling [156.7248383178991]
Collaborative Metric Learning (CML) paradigm has aroused wide interest in the area of recommendation systems (RS) We find that negative sampling would lead to a biased estimation of the generalization error. Motivated by this, we propose an efficient alternative without negative sampling for CML named textitSampling-Free Collaborative Metric Learning (SFCML)
arXiv Detail & Related papers (2022-06-23T08:50:22Z)
Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions. We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism. We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z)
Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification [28.128464387420216]
We show that learning is fundamentally constrained by a lack of minority group samples. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal.
arXiv Detail & Related papers (2022-05-26T00:35:11Z)
Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems. We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z)
Is Simple Uniform Sampling Effective for Center-Based Clustering with Outliers: When and Why? [14.757827466271209]
We propose a simple uniform sampling framework for solving three representative center-based clustering with outliers problems. Our analysis is fundamentally different from the previous (uniform and non-uniform) sampling based ideas.
arXiv Detail & Related papers (2021-02-28T16:43:37Z)
Learning Entangled Single-Sample Distributions via Iterative Trimming [28.839136703139225]
We analyze a simple and computationally efficient method based on iteratively trimming samples and re-estimating the parameter on the trimmed sample set. We show that the method in logarithmic iterations outputs an estimation whose error only depends on the noise level of the $lceil alpha n rceil$-th noisiest data point.
arXiv Detail & Related papers (2020-04-20T18:37:43Z)
The Simulator: Understanding Adaptive Sampling in the Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator. We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors. Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.