Related papers: Value-Aware Numerical Representations for Transformer Language Models

Value-Aware Numerical Representations for Transformer Language Models

URL: http://arxiv.org/abs/2601.09706v1
Date: Wed, 14 Jan 2026 18:59:14 GMT
Title: Value-Aware Numerical Representations for Transformer Language Models
Authors: Andreea Dutulescu, Stefan Ruseti, Mihai Dascalu,
Abstract summary: Transformer-based language models often achieve strong results on mathematical reasoning benchmarks.<n>A central limitation is that numbers are processed as symbolic tokens whose embeddings do not explicitly encode numerical value.<n>We introduce a value-aware numerical representation that augments standard tokenized inputs with a dedicated prefix token.
Score: 1.2680800636608986
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limitation is that numbers are processed as symbolic tokens whose embeddings do not explicitly encode numerical value, leading to systematic errors. We introduce a value-aware numerical representation that augments standard tokenized inputs with a dedicated prefix token whose embedding is explicitly conditioned on the underlying numerical value. This mechanism injects magnitude information directly into the model's input space while remaining compatible with existing tokenizers and decoder-only Transformer architectures. Evaluation on arithmetic tasks shows that the proposed approach outperforms baselines across numerical formats, tasks, and operand lengths. These results indicate that explicitly encoding numerical value is an effective and efficient way to improve fundamental numerical robustness in language models.

Related papers

Training Language Models with homotokens Leads to Delayed Overfitting [2.531076482407163]
Subword tokenization introduces a computational layer in language models where many distinct token sequences decode to the same surface form and preserve meaning.<n>We formalize homotoken-as a strictly meaning-preserving form of data augmentation.<n>In data-constrained pretraining, homotoken augmentation consistently delays overfitting under repeated data exposure.<n>In multilingual fine-tuning, we find that the effectiveness of homotokens depends on tokenizer quality.
arXiv Detail & Related papers (2026-01-06T09:57:00Z)
How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis [0.0]
Despite its significance, tokenization in the context of assembly code remains an underexplored area.<n>We explore preprocessing customization options and pre-tokenization rules tailored to the unique characteristics of assembly code.<n>We compare tokenizers based on tokenization efficiency, vocabulary compression, and representational fidelity for assembly code.
arXiv Detail & Related papers (2025-11-05T19:45:26Z)
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations [83.93566096400723]
We find that instruction-tuned models retain up to 93.4% of their original performance when given a randomly sampled tokenization.<n>Character-level segmentation improves string manipulation and code understanding tasks by up to +14%.<n>Right-aligned digit grouping enhances large-number arithmetic by +33%.
arXiv Detail & Related papers (2025-06-23T18:02:26Z)
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models [19.47343987998194]
Large Language Models (LLMs) have demonstrated impressive capabilities in natural language processing tasks.<n>Their performance on numerical reasoning tasks, such as basic arithmetic, numerical, and magnitude comparison, remains surprisingly poor.<n>Existing benchmarks primarily focus on linguistic competence or structured mathematical problem-solving.
arXiv Detail & Related papers (2025-02-16T10:48:28Z)
How to Leverage Digit Embeddings to Represent Numbers? [13.880400817682059]
In numerical reasoning, understanding numbers themselves is still a challenge for existing language models.<n>Character-level embeddings of numbers have emerged as a promising approach to improve number representation.<n>We use mathematical priors to compute aggregated digit embeddings and explicitly incorporate these aggregates into transformer models.
arXiv Detail & Related papers (2024-07-01T01:31:41Z)
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models [0.0]
This paper introduces an efficient and robust method for discovering interpretable circuits in large language models. We propose training sparse autoencoders on carefully designed positive and negative examples. Our findings highlight the promise of discrete sparse autoencoders for scalable and efficient mechanistic interpretability.
arXiv Detail & Related papers (2024-05-21T06:26:10Z)
xVal: A Continuous Numerical Tokenization for Scientific Language Models [41.26924657687872]
We introduce xVal, a strategy for continuously tokenizing numbers within language models.<n>We train specially-modified language models from scratch on a variety of scientific datasets formatted as text.
arXiv Detail & Related papers (2023-10-04T17:26:16Z)
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. We take an incremental computing approach, looking to reuse calculations as the inputs change. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z)
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition [57.51793420986745]
We propose an unconventional network for handwritten mathematical expression recognition (HMER) named Counting-Aware Network (CAN) We design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations. Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models.
arXiv Detail & Related papers (2022-07-23T08:39:32Z)
Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications. Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture. We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z)
NumGPT: Improving Numeracy Ability of Generative Pre-trained Models [59.931394234642816]
We propose NumGPT, a generative pre-trained model that explicitly models the numerical properties of numbers in texts. Specifically, it leverages a prototype-based numeral embedding to encode the mantissa of the number and an individual embedding to encode the exponent of the number. A numeral-aware loss function is designed to integrate numerals into the pre-training objective of NumGPT.
arXiv Detail & Related papers (2021-09-07T15:06:12Z)
Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model. We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder. We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.