Small Character Models Match Large Word Models for Autocomplete Under
Memory Constraints
- URL: http://arxiv.org/abs/2210.03251v2
- Date: Wed, 7 Jun 2023 23:15:30 GMT
- Title: Small Character Models Match Large Word Models for Autocomplete Under
Memory Constraints
- Authors: Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad
Abdul-Mageed, Laks V.S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo
Henrique de Rosa, Shital Shah
- Abstract summary: We study the more challenging open-domain setting consisting of low frequency user prompt patterns.
Character-based representation is effective in reducing the overall model size.
We show that a 20M parameter character model performs similar to an 80M parameter word model in the vanilla setting.
- Score: 32.79377465262468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autocomplete is a task where the user inputs a piece of text, termed prompt,
which is conditioned by the model to generate semantically coherent
continuation. Existing works for this task have primarily focused on datasets
(e.g., email, chat) with high frequency user prompt patterns (or focused
prompts) where word-based language models have been quite effective. In this
work, we study the more challenging open-domain setting consisting of low
frequency user prompt patterns (or broad prompts, e.g., prompt about 93rd
academy awards) and demonstrate the effectiveness of character-based language
models. We study this problem under memory-constrained settings (e.g., edge
devices and smartphones), where character-based representation is effective in
reducing the overall model size (in terms of parameters). We use WikiText-103
benchmark to simulate broad prompts and demonstrate that character models rival
word models in exact match accuracy for the autocomplete task, when controlled
for the model size. For instance, we show that a 20M parameter character model
performs similar to an 80M parameter word model in the vanilla setting. We
further propose novel methods to improve character models by incorporating
inductive bias in the form of compositional information and representation
transfer from large word models. Datasets and code used in this work are
available at https://github.com/UBC-NLP/char_autocomplete.
Related papers
- Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates.
We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z) - Learning Mutually Informed Representations for Characters and Subwords [26.189422354038978]
We introduce the entanglement model, aiming to combine character and subword language models.
Inspired by vision-language models, our model treats characters and subwords as separate modalities.
We evaluate our model on text classification, named entity recognition, POS-tagging, and character-level sequence labeling.
arXiv Detail & Related papers (2023-11-14T02:09:10Z) - Small Language Models for Tabular Data [0.0]
We show the ability of deep representation learning to address problems of classification and regression from small and poorly formed datasets.
We find that small models have sufficient capacity for approximation of various functions and achieve record classification benchmark accuracy.
arXiv Detail & Related papers (2022-11-05T16:57:55Z) - Don't Prompt, Search! Mining-based Zero-Shot Learning with Language
Models [37.8952605358518]
Masked language models like BERT can perform text classification in a zero-shot fashion.
We propose an alternative mining-based approach for zero-shot learning.
arXiv Detail & Related papers (2022-10-26T15:52:30Z) - An Information-theoretic Approach to Prompt Engineering Without Ground
Truth Labels [55.06990011183662]
We introduce a new method for selecting prompt templates textitwithout labeled examples and textitwithout direct access to the model.
Across 8 datasets representing 7 distinct NLP tasks, we show that when a template has high mutual information, it also has high accuracy on the task.
arXiv Detail & Related papers (2022-03-21T21:51:43Z) - Eliciting Knowledge from Pretrained Language Models for Prototypical
Prompt Verbalizer [12.596033546002321]
In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning.
For zero-shot settings, knowledge is elicited from pretrained language models by a manually designed template to form initial prototypical embeddings.
For few-shot settings, models are tuned to learn meaningful and interpretable prototypical embeddings.
arXiv Detail & Related papers (2022-01-14T12:04:37Z) - Charformer: Fast Character Transformers via Gradient-based Subword
Tokenization [50.16128796194463]
We propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.
We introduce a soft gradient-based subword tokenization module (GBST) that automatically learns latent subword representations from characters.
We additionally introduce Charformer, a deep Transformer model that integrates GBST and operates on the byte level.
arXiv Detail & Related papers (2021-06-23T22:24:14Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.