xVal: A Continuous Number Encoding for Large Language Models
- URL: http://arxiv.org/abs/2310.02989v1
- Date: Wed, 4 Oct 2023 17:26:16 GMT
- Title: xVal: A Continuous Number Encoding for Large Language Models
- Authors: Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti,
Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben
Ohana, Liam Parker, Bruno R\'egaldo-Saint Blancard, Tiberiu Tesileanu,
Kyunghyun Cho, Shirley Ho
- Abstract summary: We propose xVal, a numerical encoding scheme that represents any real number using just a single token.
We empirically evaluate our proposal on a number of synthetic and real-world datasets.
- Score: 42.19323262199993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models have not yet been broadly adapted for the analysis of
scientific datasets due in part to the unique difficulties of tokenizing
numbers. We propose xVal, a numerical encoding scheme that represents any real
number using just a single token. xVal represents a given real number by
scaling a dedicated embedding vector by the number value. Combined with a
modified number-inference approach, this strategy renders the model end-to-end
continuous when considered as a map from the numbers of the input string to
those of the output string. This leads to an inductive bias that is generally
more suitable for applications in scientific domains. We empirically evaluate
our proposal on a number of synthetic and real-world datasets. Compared with
existing number encoding schemes, we find that xVal is more token-efficient and
demonstrates improved generalization.
Related papers
- Laying Anchors: Semantically Priming Numerals in Language Modeling [11.831883526217942]
We introduce strategies to semantically prime numerals in any corpus by generating anchors governed by the distribution of numerals in said corpus.
We demonstrate significant improvements in the mathematical grounding of our learned embeddings.
arXiv Detail & Related papers (2024-04-02T00:02:00Z) - NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning [27.584258258635945]
Language models struggle with handling numerical data and performing arithmetic operations.
We propose a simple adjustment to how numbers are represented by including the count of digits before each number.
By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number.
arXiv Detail & Related papers (2024-03-30T19:46:59Z) - Multi-Candidate Speculative Decoding [82.05519287513444]
Large language models have shown impressive capabilities across a variety of NLP tasks, yet their generating text autoregressively is time-consuming.
One way to speed them up is speculative decoding, which generates candidate segments from a fast draft model that is then verified in parallel by the target model.
This paper proposes sampling multiple candidates from a draft model and then organising them in batches for verification.
We design algorithms for efficient multi-candidate verification while maintaining the distribution of the target model.
arXiv Detail & Related papers (2024-01-12T17:15:23Z) - Estimating Numbers without Regression [30.79061214333164]
Despite recent successes in language models, their ability to represent numbers is insufficient.
Subword tokenization fails to explicitly capture magnitude by splitting numbers into arbitrary chunks.
We show that changing the model's vocabulary instead (eg introduce a new token for numbers in range 10-100) is a far better trade-off.
arXiv Detail & Related papers (2023-10-09T23:07:05Z) - Lexinvariant Language Models [84.2829117441298]
Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM)
We study textitlexinvariantlanguage models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice.
We show that a lexinvariant LM can attain perplexity comparable to that of a standard language model, given a sufficiently long context.
arXiv Detail & Related papers (2023-05-24T19:10:46Z) - Linear-Time Modeling of Linguistic Structure: An Order-Theoretic
Perspective [97.57162770792182]
Tasks that model the relation between pairs of tokens in a string are a vital part of understanding natural language.
We show that these exhaustive comparisons can be avoided, and, moreover, the complexity can be reduced to linear by casting the relation between tokens as a partial order over the string.
Our method predicts real numbers for each token in a string in parallel and sorts the tokens accordingly, resulting in total orders of the tokens in the string.
arXiv Detail & Related papers (2023-05-24T11:47:35Z) - Reflection of Thought: Inversely Eliciting Numerical Reasoning in
Language Models via Solving Linear Systems [42.782260686177395]
We propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models.
We first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models.
We transform and formulate the task as an analytically solvable linear system.
arXiv Detail & Related papers (2022-10-11T00:57:19Z) - Probing for the Usage of Grammatical Number [103.8175326220026]
We try to find encodings that the model actually uses, introducing a usage-based probing setup.
We focus on how BERT encodes grammatical number, and on how it uses this encoding to solve the number agreement task.
arXiv Detail & Related papers (2022-04-19T11:59:52Z) - NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number.
A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z) - An Empirical Investigation of Contextualized Number Prediction [34.56914472173953]
We consider two tasks: (1)masked number prediction-predicting a missing numerical value within a sentence, and (2)numerical anomaly detection-detecting an errorful numeric value within a sentence.
We introduce a suite of output distribution parameterizations that incorporate latent variables to add expressivity and better fit the natural distribution of numeric values in running text.
We evaluate these models on two numeric datasets in the financial and scientific domain.
arXiv Detail & Related papers (2020-10-20T23:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.