Related papers: xVal: A Continuous Number Encoding for Large Language Models

Related papers

Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution [7.681258910515419]
Tabular data presents unique challenges due to its heterogeneous nature and complex structural relationships. High predictive performance and robustness in tabular data analysis holds significant promise for numerous applications. The recent advent of large language models, such as GPT and LLaMA, has further revolutionized the field, facilitating more advanced and diverse applications with minimal fine-tuning.
arXiv Detail & Related papers (2024-08-20T04:59:19Z)
Laying Anchors: Semantically Priming Numerals in Language Modeling [11.831883526217942]
We introduce strategies to semantically prime numerals in any corpus by generating anchors governed by the distribution of numerals in said corpus. We demonstrate significant improvements in the mathematical grounding of our learned embeddings.
arXiv Detail & Related papers (2024-04-02T00:02:00Z)
NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning [27.584258258635945]
Language models struggle with handling numerical data and performing arithmetic operations. We propose a simple adjustment to how numbers are represented by including the count of digits before each number. By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number.
arXiv Detail & Related papers (2024-03-30T19:46:59Z)
Computational Models to Study Language Processing in the Human Brain: A Survey [47.81066391664416]
This paper reviews efforts in using computational models for brain research, highlighting emerging trends. Our analysis reveals that no single model outperforms others on all datasets.
arXiv Detail & Related papers (2024-03-20T08:01:22Z)
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [139.69207791947738]
Dolma is a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices.
arXiv Detail & Related papers (2024-01-31T20:29:50Z)
Multi-Candidate Speculative Decoding [82.05519287513444]
Large language models have shown impressive capabilities across a variety of NLP tasks, yet their generating text autoregressively is time-consuming. One way to speed them up is speculative decoding, which generates candidate segments from a fast draft model that is then verified in parallel by the target model. This paper proposes sampling multiple candidates from a draft model and then organising them in batches for verification. We design algorithms for efficient multi-candidate verification while maintaining the distribution of the target model.
arXiv Detail & Related papers (2024-01-12T17:15:23Z)
Estimating Numbers without Regression [30.79061214333164]
Despite recent successes in language models, their ability to represent numbers is insufficient. Subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks. We show that changing the model's vocabulary instead (eg introduce a new token for numbers in range 10-100) is a far better trade-off.
arXiv Detail & Related papers (2023-10-09T23:07:05Z)
Arithmetic with Language Models: from Memorization to Computation [3.077668143048211]
This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing.
arXiv Detail & Related papers (2023-08-02T13:58:37Z)
Lexinvariant Language Models [84.2829117441298]
Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM) We study textitlexinvariantlanguage models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice. We show that a lexinvariant LM can attain perplexity comparable to that of a standard language model, given a sufficiently long context.
arXiv Detail & Related papers (2023-05-24T19:10:46Z)
Linear-Time Modeling of Linguistic Structure: An Order-Theoretic Perspective [97.57162770792182]
Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language. We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string. Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
arXiv Detail & Related papers (2023-05-24T11:47:35Z)
MIST: a Large-Scale Annotated Resource and Neural Models for Functions of Modal Verbs in English Scientific Text [1.8502316793903635]
We introduce the MIST dataset, which contains 3737 modal instances in five scientific domains annotated for their semantic, pragmatic, or rhetorical function. We systematically evaluate a set of competitive neural architectures on MIST. Our corpus analysis provides evidence that scientific communities differ in their usage of modal verbs.
arXiv Detail & Related papers (2022-12-14T11:10:03Z)
Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems [42.782260686177395]
We propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models. We first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models. We transform and formulate the task as an analytically solvable linear system.
arXiv Detail & Related papers (2022-10-11T00:57:19Z)
Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods. We conducted three types of experiments -- monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z)
Probing for the Usage of Grammatical Number [103.8175326220026]
We try to find encodings that the model actually uses, introducing a usage-based probing setup. We focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task.
arXiv Detail & Related papers (2022-04-19T11:59:52Z)
NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts. Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number. A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z)
A Cognitive Regularizer for Language Modeling [36.256053903862956]
We augment the canonical MLE objective for training language models by encoding UID as regularization. We find that using UID regularization consistently improves perplexity in language models. We also find that UID-regularized language models are higher-entropy and produce text that is longer and more lexically diverse.
arXiv Detail & Related papers (2021-05-15T05:37:42Z)
An Empirical Investigation of Contextualized Number Prediction [34.56914472173953]
We consider two tasks: (1)masked number prediction-predicting a missing numerical value within a sentence, and (2)numerical anomaly detection-detecting an errorful numeric value within a sentence. We introduce a suite of output distribution parameterizations that incorporate latent variables to add expressivity and better fit the natural distribution of numeric values in running text. We evaluate these models on two numeric datasets in the financial and scientific domain.
arXiv Detail & Related papers (2020-10-20T23:12:23Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Data Augmentation for Spoken Language Understanding via Pretrained Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity. We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.