Related papers: A Constant-time Adaptive Negative Sampling

A Constant-time Adaptive Negative Sampling

URL: http://arxiv.org/abs/2012.15843v1
Date: Thu, 31 Dec 2020 18:56:41 GMT
Title: A Constant-time Adaptive Negative Sampling
Authors: Shabnam Daghaghi, Tharun Medini, Beidi Chen, Mengnan Zhao, Anshumali Shrivastava
Abstract summary: We show a class of distribution where the sampling scheme is truly adaptive and provably generates negative samples in constant time. Our implementation in C++ on commodity CPU is significantly faster, in terms of wall clock time.
Score: 33.585006286223994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Softmax classifiers with a very large number of classes naturally occur in many applications such as natural language processing and information retrieval. The calculation of full-softmax is very expensive from the computational and energy perspective. There have been a variety of sampling approaches to overcome this challenge, popularly known as negative sampling (NS). Ideally, NS should sample negative classes from a distribution that is dependent on the input data, the current parameters, and the correct positive class. Unfortunately, due to the dynamically updated parameters and data samples, there does not exist any sampling scheme that is truly adaptive and also samples the negative classes in constant time every iteration. Therefore, alternative heuristics like random sampling, static frequency-based sampling, or learning-based biased sampling, which primarily trade either the sampling cost or the adaptivity of samples per iteration, are adopted. In this paper, we show a class of distribution where the sampling scheme is truly adaptive and provably generates negative samples in constant time. Our implementation in C++ on commodity CPU is significantly faster, in terms of wall clock time, compared to the most optimized TensorFlow implementations of standard softmax or other sampling approaches on modern GPUs (V100s).

Related papers

Adaptive Sampled Softmax with Inverted Multi-Index: Methods, Theory and Applications [79.53938312089308]
The MIDX-Sampler is a novel adaptive sampling strategy based on an inverted multi-index approach. Our method is backed by rigorous theoretical analysis, addressing key concerns such as sampling bias, gradient bias, convergence rates, and generalization error bounds.
arXiv Detail & Related papers (2025-01-15T04:09:21Z)
Quasi-random Multi-Sample Inference for Large Language Models [1.647759094903376]
Large language models (LLMs) are often equipped with multi-sample decoding strategies.<n>Traditional text generation methods, such as beam search and sampling-based techniques, have notable limitations.<n>This study explores the potential of arithmetic sampling, contrasting it with ancestral sampling.
arXiv Detail & Related papers (2024-11-09T18:55:04Z)
Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing [13.775902519100075]
Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist sampling. Deep learning-based reconstruction has been a promising alternative to optimization-based reconstruction.
arXiv Detail & Related papers (2024-09-18T06:51:29Z)
Restart Sampling for Improving Generative Processes [30.745245429072735]
ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We propose a novel sampling algorithm called Restart in order to better balance discretization errors and contraction.
arXiv Detail & Related papers (2023-06-26T17:48:25Z)
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs [60.58434523646137]
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency. We introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question. Our experiments show that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.
arXiv Detail & Related papers (2023-05-19T17:49:25Z)
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking. Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z)
Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD) Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
On the Comparison between Cyclic Sampling and Random Reshuffling [27.27774034428748]
Cyclic sampling draws the samples in a fixed, cyclic order, which is less robust than reshuffling the samples periodically. Existing works have established worst case convergence rates for cyclic sampling, which are generally worse than that of random reshuffling. In this paper, we found a certain cyclic order can be much faster than reshuffling and one can discover it at a low cost.
arXiv Detail & Related papers (2021-04-25T09:29:43Z)
Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs) These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)
Minority Class Oversampling for Tabular Data with Deep Generative Models [4.976007156860967]
We study the ability of deep generative models to provide realistic samples that improve performance on imbalanced classification tasks via oversampling. Our experiments show that the way the method of sampling does not affect quality, but runtime varies widely. We also observe that the improvements in terms of performance metric, while shown to be significant, often are minor in absolute terms.
arXiv Detail & Related papers (2020-05-07T21:35:57Z)
Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)
Incremental Sampling Without Replacement for Sequence Models [39.3035292844624]
We present an elegant procedure for sampling without replacement from a broad class of randomized programs. Our approach is incremental, i.e., samples can be drawn one at a time, allowing for increased flexibility.
arXiv Detail & Related papers (2020-02-21T00:12:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.