Adaptive Semiparametric Language Models
- URL: http://arxiv.org/abs/2102.02557v1
- Date: Thu, 4 Feb 2021 11:47:03 GMT
- Title: Adaptive Semiparametric Language Models
- Authors: Dani Yogatama, Cyprien de Masson d'Autume, Lingpeng Kong
- Abstract summary: We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component.
Experiments on word-based and character-based language modeling datasets demonstrate the efficacy of our proposed method.
- Score: 17.53604394786977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a language model that combines a large parametric neural network
(i.e., a transformer) with a non-parametric episodic memory component in an
integrated architecture. Our model uses extended short-term context by caching
local hidden states -- similar to transformer-XL -- and global long-term memory
by retrieving a set of nearest neighbor tokens at each timestep. We design a
gating function to adaptively combine multiple information sources to make a
prediction. This mechanism allows the model to use either local context,
short-term memory, or long-term memory (or any combination of them) on an ad
hoc basis depending on the context. Experiments on word-based and
character-based language modeling datasets demonstrate the efficacy of our
proposed method compared to strong baselines.
Related papers
- Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Training Language Models with Memory Augmentation [28.4608705738799]
We present a novel training approach designed for training language models with memory augmentation.
Our approach uses a training objective that directly takes in-batch examples as accessible memory.
We demonstrate significant gains over previous memory-augmented approaches.
arXiv Detail & Related papers (2022-05-25T11:37:29Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Leveraging Advantages of Interactive and Non-Interactive Models for
Vector-Based Cross-Lingual Information Retrieval [12.514666775853598]
We propose a novel framework to leverage the advantages of interactive and non-interactive models.
We introduce semi-interactive mechanism, which builds our model upon non-interactive architecture but encodes each document together with its associated multilingual queries.
Our methods significantly boost the retrieval accuracy while maintaining the computational efficiency.
arXiv Detail & Related papers (2021-11-03T03:03:19Z) - Capturing Structural Locality in Non-parametric Language Models [85.94669097485992]
We propose a simple yet effective approach for adding locality information into non-parametric language models.
Experiments on two different domains, Java source code and Wikipedia text, demonstrate that locality features improve model efficacy.
arXiv Detail & Related papers (2021-10-06T15:53:38Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Improved Semantic Role Labeling using Parameterized Neighborhood Memory
Adaptation [22.064890647610348]
We propose a parameterized neighborhood memory adaptive (PNMA) method that uses a parameterized representation of the nearest neighbors of tokens in a memory of activations.
We empirically show that PNMA consistently improves the SRL performance of the base model irrespective of types of word embeddings.
arXiv Detail & Related papers (2020-11-29T22:51:25Z) - Coarse-to-Fine Memory Matching for Joint Retrieval and Classification [0.7081604594416339]
We present a novel end-to-end language model for joint retrieval and classification.
We evaluate it on the standard blind test set of the FEVER fact verification dataset.
We extend exemplar auditing to this setting for analyzing and constraining the model.
arXiv Detail & Related papers (2020-11-29T05:06:03Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.