Related papers: Adapting Large Language Models for Character-based Augmentative and Alternative Communication

Adapting Large Language Models for Character-based Augmentative and Alternative Communication

URL: http://arxiv.org/abs/2501.10582v1
Date: Fri, 17 Jan 2025 22:20:55 GMT
Title: Adapting Large Language Models for Character-based Augmentative and Alternative Communication
Authors: Dylan Gaines, Keith Vertanen,
Abstract summary: Augmentative and Alternative Communication (AAC) users may write letter-by-letter via an interface that uses a character language model.<n>We investigate how to practically use such models to make accurate and efficient character predictions.
Score: 8.072353085704629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Users of Augmentative and Alternative Communication (AAC) may write letter-by-letter via an interface that uses a character language model. However, most state-of-the-art large pretrained language models predict subword tokens of variable length. We investigate how to practically use such models to make accurate and efficient character predictions. We fine-tune models using a large dataset of sentences we curated in which each sentence is rated according to how useful it might be for spoken or written AAC communication. We find that using an algorithm to produce character predictions from a subword large language model provides more accurate predictions than adding a classification layer or using a byte-level model. We also find that our domain adaptation curriculum is effective at improving model performance on simple, conversational text.

Related papers

Vocabulary Expansion of Chat Models with Unlabeled Target Language Data [38.341705137026985]
Chat models (i.e. language models trained to follow instructions through conversation with humans) outperform base models (i.e. trained solely on unlabeled data) in both conversation and general task-solving abilities.<n>These models are generally English-centric and require further adaptation for languages that are underrepresented in or absent from their training data.<n>We propose post-hoc techniques that inject information from the source model without requiring any further training. Experiments reveal the effectiveness of our methods, helping the adapted models to achieve performance improvements in 87% of cases.
arXiv Detail & Related papers (2024-12-16T12:26:28Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation [82.5217996570387]
We adapt a pre-trained language model for auto-regressive text-to-image generation. We find that pre-trained language models offer limited help.
arXiv Detail & Related papers (2023-11-27T07:19:26Z)
Bidirectional Representations for Low Resource Spoken Language Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings. The approach uses a masked language modelling objective to learn the representations. We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Are Some Words Worth More than Others? [3.5598388686985354]
We propose two new intrinsic evaluation measures within the framework of a simple word prediction task. We evaluate several commonly-used large English language models using our proposed metrics.
arXiv Detail & Related papers (2020-10-12T23:12:11Z)
Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
Toward Better Storylines with Sentence-Level Language Models [54.91921545103256]
We propose a sentence-level language model which selects the next sentence in a story from a finite set of fluent alternatives. We demonstrate the effectiveness of our approach with state-of-the-art accuracy on the unsupervised Story Cloze task.
arXiv Detail & Related papers (2020-05-11T16:54:19Z)
Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.