Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
- URL: http://arxiv.org/abs/2201.12431v1
- Date: Fri, 28 Jan 2022 21:38:56 GMT
- Title: Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
- Authors: Uri Alon, Frank F. Xu, Junxian He, Sudipta Sengupta, Dan Roth, Graham
Neubig
- Abstract summary: RetoMaton is a weighted finite automaton built on top of the datastore.
Traversing this automaton at inference time, in parallel to the LM inference, reduces its perplexity.
- Score: 129.25914272977542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-based language models (R-LM) model the probability of natural
language text by combining a standard language model (LM) with examples
retrieved from an external datastore at test time. While effective, a major
bottleneck of using these models in practice is the computationally costly
datastore search, which can be performed as frequently as every time step. In
this paper, we present RetoMaton -- retrieval automaton -- which approximates
the datastore search, based on (1) clustering of entries into "states", and (2)
state transitions from previous entries. This effectively results in a weighted
finite automaton built on top of the datastore, instead of representing the
datastore as a flat list. The creation of the automaton is unsupervised, and a
RetoMaton can be constructed from any text collection: either the original
training corpus or from another domain. Traversing this automaton at inference
time, in parallel to the LM inference, reduces its perplexity, or alternatively
saves up to 83% of the nearest neighbor searches over kNN-LM (Khandelwal et
al., 2020), without hurting perplexity.
Related papers
- You can't pick your neighbors, or can you? When and how to rely on
retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores.
One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model.
We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z) - Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition.
We present a dataset preparation method based on the granular alignment of ITN examples.
One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z) - Chunk-based Nearest Neighbor Machine Translation [7.747003493657217]
We introduce a textitchunk-based $k$NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token.
Experiments on machine translation in two settings, static domain adaptation and on-the-fly'' adaptation, show that the chunk-based model leads to a significant speed-up (up to 4 times) with only a small drop in translation quality.
arXiv Detail & Related papers (2022-05-24T17:39:25Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Neural Data-to-Text Generation with LM-based Text Augmentation [27.822282190362856]
We show that a weakly supervised training paradigm is able to outperform fully supervised seq2seq models with less than 10% annotations.
By utilizing all annotated data, our model can boost the performance of a standard seq2seq model by over 5 BLEU points.
arXiv Detail & Related papers (2021-02-06T10:21:48Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.