Efficient Nearest Neighbor Language Models
- URL: http://arxiv.org/abs/2109.04212v1
- Date: Thu, 9 Sep 2021 12:32:28 GMT
- Title: Efficient Nearest Neighbor Language Models
- Authors: Junxian He, Graham Neubig, Taylor Berg-Kirkpatrick
- Abstract summary: Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
- Score: 114.40866461741795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-parametric neural language models (NLMs) learn predictive distributions
of text utilizing an external datastore, which allows them to learn through
explicitly memorizing the training datapoints. While effective, these models
often require retrieval from a large datastore at test time, significantly
increasing the inference overhead and thus limiting the deployment of
non-parametric NLMs in practical applications. In this paper, we take the
recently proposed $k$-nearest neighbors language model (Khandelwal et al.,
2019) as an example, exploring methods to improve its efficiency along various
dimensions. Experiments on the standard WikiText-103 benchmark and
domain-adaptation datasets show that our methods are able to achieve up to a 6x
speed-up in inference speed while retaining comparable performance. The
empirical analysis we present may provide guidelines for future research
seeking to develop or deploy more efficient non-parametric NLMs.
Related papers
- Few-shot learning for automated content analysis: Efficient coding of
arguments and claims in the debate on arms deliveries to Ukraine [0.9576975587953563]
Pre-trained language models (PLM) based on transformer neural networks offer great opportunities to improve automatic content analysis in communication science.
Three characteristics so far impeded the widespread adoption of the methods in the applying disciplines: the dominance of English language models in NLP research, the necessary computing resources, and the effort required to produce training data to fine-tune PLMs.
We test our approach on a realistic use case from communication science to automatically detect claims and arguments together with their stance in the German news debate on arms deliveries to Ukraine.
arXiv Detail & Related papers (2023-12-28T11:39:08Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Revisiting Neural Scaling Laws in Language and Vision [43.57394336742374]
We argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting parameters.
We present a recipe for estimating scaling law parameters reliably from learning curves.
We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains.
arXiv Detail & Related papers (2022-09-13T09:41:51Z) - Offline RL for Natural Language Generation with Implicit Language Q
Learning [87.76695816348027]
Large language models can be inconsistent when it comes to completing user specified tasks.
We propose a novel RL method, that combines both the flexible utility framework of RL with the ability of supervised learning.
In addition to empirically validating ILQL, we present a detailed empirical analysis situations where offline RL can be useful in natural language generation settings.
arXiv Detail & Related papers (2022-06-05T18:38:42Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Fine-tuning BERT for Low-Resource Natural Language Understanding via
Active Learning [30.5853328612593]
In this work, we explore fine-tuning methods of BERT -- a pre-trained Transformer based language model.
Our experimental results show an advantage in model performance by maximizing the approximate knowledge gain of the model.
We analyze the benefits of freezing layers of the language model during fine-tuning to reduce the number of trainable parameters.
arXiv Detail & Related papers (2020-12-04T08:34:39Z) - Improved Semantic Role Labeling using Parameterized Neighborhood Memory
Adaptation [22.064890647610348]
We propose a parameterized neighborhood memory adaptive (PNMA) method that uses a parameterized representation of the nearest neighbors of tokens in a memory of activations.
We empirically show that PNMA consistently improves the SRL performance of the base model irrespective of types of word embeddings.
arXiv Detail & Related papers (2020-11-29T22:51:25Z) - Few-shot Learning for Spatial Regression [31.022722103424684]
We propose a few-shot learning method for spatial regression.
Our model is trained using spatial datasets on various attributes in various regions.
In our experiments, we demonstrate that the proposed method achieves better predictive performance than existing meta-learning methods.
arXiv Detail & Related papers (2020-10-09T04:05:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.