A Constant-time Adaptive Negative Sampling
- URL: http://arxiv.org/abs/2012.15843v1
- Date: Thu, 31 Dec 2020 18:56:41 GMT
- Title: A Constant-time Adaptive Negative Sampling
- Authors: Shabnam Daghaghi, Tharun Medini, Beidi Chen, Mengnan Zhao, Anshumali
Shrivastava
- Abstract summary: We show a class of distribution where the sampling scheme is truly adaptive and provably generates negative samples in constant time.
Our implementation in C++ on commodity CPU is significantly faster, in terms of wall clock time.
- Score: 33.585006286223994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Softmax classifiers with a very large number of classes naturally occur in
many applications such as natural language processing and information
retrieval. The calculation of full-softmax is very expensive from the
computational and energy perspective. There have been a variety of sampling
approaches to overcome this challenge, popularly known as negative sampling
(NS). Ideally, NS should sample negative classes from a distribution that is
dependent on the input data, the current parameters, and the correct positive
class. Unfortunately, due to the dynamically updated parameters and data
samples, there does not exist any sampling scheme that is truly adaptive and
also samples the negative classes in constant time every iteration. Therefore,
alternative heuristics like random sampling, static frequency-based sampling,
or learning-based biased sampling, which primarily trade either the sampling
cost or the adaptivity of samples per iteration, are adopted. In this paper, we
show a class of distribution where the sampling scheme is truly adaptive and
provably generates negative samples in constant time. Our implementation in C++
on commodity CPU is significantly faster, in terms of wall clock time, compared
to the most optimized TensorFlow implementations of standard softmax or other
sampling approaches on modern GPUs (V100s).
Related papers
- Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing [13.775902519100075]
Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist sampling.
Deep learning-based reconstruction has been a promising alternative to optimization-based reconstruction.
arXiv Detail & Related papers (2024-09-18T06:51:29Z) - Restart Sampling for Improving Generative Processes [30.745245429072735]
ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time.
We propose a novel sampling algorithm called Restart in order to better balance discretization errors and contraction.
arXiv Detail & Related papers (2023-06-26T17:48:25Z) - Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning
and Coding with LLMs [60.58434523646137]
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency.
We introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question.
Our experiments show that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.
arXiv Detail & Related papers (2023-05-19T17:49:25Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - On the Comparison between Cyclic Sampling and Random Reshuffling [27.27774034428748]
Cyclic sampling draws the samples in a fixed, cyclic order, which is less robust than reshuffling the samples periodically.
Existing works have established worst case convergence rates for cyclic sampling, which are generally worse than that of random reshuffling.
In this paper, we found a certain cyclic order can be much faster than reshuffling and one can discover it at a low cost.
arXiv Detail & Related papers (2021-04-25T09:29:43Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Minority Class Oversampling for Tabular Data with Deep Generative Models [4.976007156860967]
We study the ability of deep generative models to provide realistic samples that improve performance on imbalanced classification tasks via oversampling.
Our experiments show that the way the method of sampling does not affect quality, but runtime varies widely.
We also observe that the improvements in terms of performance metric, while shown to be significant, often are minor in absolute terms.
arXiv Detail & Related papers (2020-05-07T21:35:57Z) - Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z) - Incremental Sampling Without Replacement for Sequence Models [39.3035292844624]
We present an elegant procedure for sampling without replacement from a broad class of randomized programs.
Our approach is incremental, i.e., samples can be drawn one at a time, allowing for increased flexibility.
arXiv Detail & Related papers (2020-02-21T00:12:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.