Related papers: Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints

URL: http://arxiv.org/abs/2210.03251v2
Date: Wed, 7 Jun 2023 23:15:30 GMT
Title: Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints
Authors: Ganesh Jawahar, Subhabrata Mukherjee, Debadeepta Dey, Muhammad Abdul-Mageed, Laks V.S. Lakshmanan, Caio Cesar Teodoro Mendes, Gustavo Henrique de Rosa, Shital Shah
Abstract summary: We study the more challenging open-domain setting consisting of low frequency user prompt patterns. Character-based representation is effective in reducing the overall model size. We show that a 20M parameter character model performs similar to an 80M parameter word model in the vanilla setting.
Score: 32.79377465262468
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autocomplete is a task where the user inputs a piece of text, termed prompt, which is conditioned by the model to generate semantically coherent continuation. Existing works for this task have primarily focused on datasets (e.g., email, chat) with high frequency user prompt patterns (or focused prompts) where word-based language models have been quite effective. In this work, we study the more challenging open-domain setting consisting of low frequency user prompt patterns (or broad prompts, e.g., prompt about 93rd academy awards) and demonstrate the effectiveness of character-based language models. We study this problem under memory-constrained settings (e.g., edge devices and smartphones), where character-based representation is effective in reducing the overall model size (in terms of parameters). We use WikiText-103 benchmark to simulate broad prompts and demonstrate that character models rival word models in exact match accuracy for the autocomplete task, when controlled for the model size. For instance, we show that a 20M parameter character model performs similar to an 80M parameter word model in the vanilla setting. We further propose novel methods to improve character models by incorporating inductive bias in the form of compositional information and representation transfer from large word models. Datasets and code used in this work are available at https://github.com/UBC-NLP/char_autocomplete.

Related papers

Adapting Large Language Models for Character-based Augmentative and Alternative Communication [8.072353085704629]
Augmentative and Alternative Communication (AAC) users may write letter-by-letter via an interface that uses a character language model. We investigate how to practically use such models to make accurate and efficient character predictions.
arXiv Detail & Related papers (2025-01-17T22:20:55Z)
Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z)
Learning Mutually Informed Representations for Characters and Subwords [26.189422354038978]
We introduce the entanglement model, aiming to combine character and subword language models. Inspired by vision-language models, our model treats characters and subwords as separate modalities. We evaluate our model on text classification, named entity recognition, POS-tagging, and character-level sequence labeling.
arXiv Detail & Related papers (2023-11-14T02:09:10Z)
Small Language Models for Tabular Data [0.0]
We show the ability of deep representation learning to address problems of classification and regression from small and poorly formed datasets. We find that small models have sufficient capacity for approximation of various functions and achieve record classification benchmark accuracy.
arXiv Detail & Related papers (2022-11-05T16:57:55Z)
Don't Prompt, Search! Mining-based Zero-Shot Learning with Language Models [37.8952605358518]
Masked language models like BERT can perform text classification in a zero-shot fashion. We propose an alternative mining-based approach for zero-shot learning.
arXiv Detail & Related papers (2022-10-26T15:52:30Z)
An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels [55.06990011183662]
We introduce a new method for selecting prompt templates textitwithout labeled examples and textitwithout direct access to the model. Across 8 datasets representing 7 distinct NLP tasks, we show that when a template has high mutual information, it also has high accuracy on the task.
arXiv Detail & Related papers (2022-03-21T21:51:43Z)
Eliciting Knowledge from Pretrained Language Models for Prototypical Prompt Verbalizer [12.596033546002321]
In this paper, we focus on eliciting knowledge from pretrained language models and propose a prototypical prompt verbalizer for prompt-tuning. For zero-shot settings, knowledge is elicited from pretrained language models by a manually designed template to form initial prototypical embeddings. For few-shot settings, models are tuned to learn meaningful and interpretable prototypical embeddings.
arXiv Detail & Related papers (2022-01-14T12:04:37Z)
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization [50.16128796194463]
We propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model. We introduce a soft gradient-based subword tokenization module (GBST) that automatically learns latent subword representations from characters. We additionally introduce Charformer, a deep Transformer model that integrates GBST and operates on the byte level.
arXiv Detail & Related papers (2021-06-23T22:24:14Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box. Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types. We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.