Related papers: Grounded Compositional Outputs for Adaptive Language Modeling

Grounded Compositional Outputs for Adaptive Language Modeling

URL: http://arxiv.org/abs/2009.11523v2
Date: Mon, 5 Oct 2020 18:26:38 GMT
Title: Grounded Compositional Outputs for Adaptive Language Modeling
Authors: Nikolaos Pappas, Phoebe Mulcaire, Noah A. Smith
Abstract summary: A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size. We propose a fully compositional output embedding layer for language models. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
Score: 59.02706635250856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models have emerged as a central component across NLP, and a great deal of progress depends on the ability to cheaply adapt them (e.g., through finetuning) to new domains and tasks. A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size and is part of what makes it resistant to such adaptation. Prior work has used compositional input embeddings based on surface forms to ameliorate this issue. In this work, we go one step beyond and propose a fully compositional output embedding layer for language models, which is further grounded in information from a structured lexicon (WordNet), namely semantically related words and free-text definitions. To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary. We evaluate the model on conventional language modeling as well as challenging cross-domain settings with an open vocabulary, finding that it matches or outperforms previous state-of-the-art output embedding methods and adaptation approaches. Our analysis attributes the improvements to sample efficiency: our model is more accurate for low-frequency words.

Related papers

Scaling LLM Pre-training with Vocabulary Curriculum [0.0]
We introduce vocabulary curriculum learning, an approach that improves pretraining efficiency with log-linear scaling gains relative to vocabulary size. Our method alternates between entropy-guided vocabulary expansion and model optimization, enabling models to learn transferable representations across diverse tokenization granularities. Experiments on small-scale GPT models demonstrate improved scaling efficiency, reinforcing the effectiveness of dynamic tokenization.
arXiv Detail & Related papers (2025-02-25T07:18:29Z)
Adapting Large Language Models for Character-based Augmentative and Alternative Communication [8.072353085704629]
Augmentative and Alternative Communication (AAC) users may write letter-by-letter via an interface that uses a character language model. We investigate how to practically use such models to make accurate and efficient character predictions.
arXiv Detail & Related papers (2025-01-17T22:20:55Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
TAMS: Translation-Assisted Morphological Segmentation [3.666125285899499]
We present a sequence-to-sequence model for canonical morpheme segmentation. Our model outperforms the baseline in a super-low resource setting but yields mixed results on training splits with more data. While further work is needed to make translations useful in higher-resource settings, our model shows promise in severely resource-constrained settings.
arXiv Detail & Related papers (2024-03-21T21:23:35Z)
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents. We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z)
Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z)
BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer [1.911678487931003]
Retrieval-based language models are increasingly employed in question-answering tasks. We develop the first Norwegian retrieval-based model by adapting the REALM framework. We show that this type of training improves the reader's performance on extractive question-answering.
arXiv Detail & Related papers (2023-04-19T13:40:47Z)
Word-Level Representation From Bytes For Language Modeling [46.28198397863388]
Sub-word tokenization is not robust to noise and difficult to generalize to new languages. We introduce a cross-attention network that builds word-level representation directly from bytes, and a sub-word level prediction based on word-level hidden states. Byte2Word is on par with the strong sub-word baseline BERT but only takes up 10% of embedding size.
arXiv Detail & Related papers (2022-11-23T03:11:13Z)
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models [11.439430077017635]
We find that pre-trained speech models optimally encode language discriminatory information in lower layers. We demonstrate that the embeddings obtained from these layers are significantly robust to classify unseen languages. We open-source the model through the NVIDIA NeMo toolkit.
arXiv Detail & Related papers (2022-11-09T18:53:59Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.