FoNE: Precise Single-Token Number Embeddings via Fourier Features
- URL: http://arxiv.org/abs/2502.09741v1
- Date: Thu, 13 Feb 2025 19:54:59 GMT
- Title: FoNE: Precise Single-Token Number Embeddings via Fourier Features
- Authors: Tianyi Zhou, Deqing Fu, Mahdi Soltanolkotabi, Robin Jia, Vatsal Sharan,
- Abstract summary: We propose a novel method that maps numbers into the embedding space with their Fourier features.
FoNE encodes each number as a single token with only two embedding dimensions per digit, effectively capturing numerical values without fragmentation.
On 6-digit decimal addition, FoNE requires 64$times$ less data to achieve 99% accuracy than subword and digit-wise embeddings.
FoNE is the only method that yields 100% accuracy on over 100,000 test examples for addition, subtraction, and multiplication.
- Score: 51.17846016593835
- License:
- Abstract: Large Language Models (LLMs) typically represent numbers using multiple tokens, which requires the model to aggregate these tokens to interpret numerical values. This fragmentation makes both training and inference less efficient and adversely affects the model's performance on number-related tasks. Inspired by the observation that pre-trained LLMs internally learn Fourier-like features for number tokens, we propose Fourier Number Embedding (FoNE), a novel method that directly maps numbers into the embedding space with their Fourier features. FoNE encodes each number as a single token with only two embedding dimensions per digit, effectively capturing numerical values without fragmentation. This compact representation accelerates both training and inference. Compared to traditional subword and digit-wise embeddings, FoNE not only reduces computational overhead but also achieves higher accuracy across various numerical tasks including addition, subtraction and multiplication. On 6-digit decimal addition, FoNE requires 64$\times$ less data to achieve 99% accuracy than subword and digit-wise embeddings while using 3$\times$ and 6$\times$ fewer tokens per number, respectively. Furthermore, FoNE is the only method that yields 100% accuracy on over 100,000 test examples for addition, subtraction, and multiplication. The codes and visualization are available at https://fouriernumber.github.io/.
Related papers
- Interleaving Text and Number Embeddings to Solve Mathemathics Problems [0.0]
We build upon a recent approach by introducing more expressive numerical embeddings.
Our method addresses key shortcomings, including the elimination of numerical artefacts and the ability to handle a wide range of magnitudes without clipping.
arXiv Detail & Related papers (2024-10-25T07:21:57Z) - Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia [55.23627698804683]
We study the scaling behavior of different numeral systems in the context of transformer-based large language models.
A base $10$ system is consistently more data-efficient than a base $102$ or $103$ system across training data scale.
We identify that base $100$ and base $1000$ systems struggle on token-level discernment and token-level operations.
arXiv Detail & Related papers (2024-09-25T22:08:31Z) - How to Leverage Digit Embeddings to Represent Numbers? [13.880400817682059]
In numerical reasoning, understanding numbers themselves is still a challenge for existing language models.
Character-level embeddings of numbers have emerged as a promising approach to improve number representation.
We use mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models.
arXiv Detail & Related papers (2024-07-01T01:31:41Z) - Matryoshka Query Transformer for Large Vision-Language Models [103.84600181927884]
We introduce the Matryoshka Query Transformer (MQT), capable of encoding an image into m visual tokens during inference.
We train a single model once, and flexibly and drastically reduce the number of inference-time visual tokens.
Our model, MQT-LLAVA, matches LLaVA-1.5 performance across 11 benchmarks using a maximum of 256 tokens instead of LLaVA's fixed 576.
arXiv Detail & Related papers (2024-05-29T17:39:42Z) - NumeroLogic: Number Encoding for Enhanced LLMs' Numerical Reasoning [27.584258258635945]
Language models struggle with handling numerical data and performing arithmetic operations.
We propose a simple adjustment to how numbers are represented by including the count of digits before each number.
By requiring the model to consider the number of digits first, it enhances the reasoning process before generating the actual number.
arXiv Detail & Related papers (2024-03-30T19:46:59Z) - Tokenization counts: the impact of tokenization on arithmetic in
frontier LLMs [3.6722413665749674]
Tokenization is the division of input text into input tokens.
We study the effect this choice has on numerical reasoning through the use of arithmetic tasks.
arXiv Detail & Related papers (2024-02-22T18:14:09Z) - Positional Description Matters for Transformers Arithmetic [58.4739272381373]
Transformers often falter on arithmetic tasks despite their vast capabilities.
We propose several ways to fix the issue, either by modifying the positional encoding directly, or by modifying the representation of the arithmetic task to leverage standard positional encoding differently.
arXiv Detail & Related papers (2023-11-22T00:31:01Z) - NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts.
Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number.
A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z) - Investigating the Limitations of the Transformers with Simple Arithmetic
Tasks [10.23804850480924]
We find that how a number is represented in its surface form has a strong influence on the model's accuracy.
We conclude that modern pretrained language models can easily learn arithmetic from very few examples.
arXiv Detail & Related papers (2021-02-25T17:22:53Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.