Advancing Sequential Numerical Prediction in Autoregressive Models
- URL: http://arxiv.org/abs/2505.13077v2
- Date: Wed, 28 May 2025 10:48:01 GMT
- Title: Advancing Sequential Numerical Prediction in Autoregressive Models
- Authors: Xiang Fei, Jinghui Lu, Qi Sun, Hao Feng, Yanjie Wang, Wei Shi, An-Lan Wang, Jingqun Tang, Can Huang,
- Abstract summary: This paper introduces Numerical Token Integrity Loss (NTIL) to address this gap.<n>NTIL operates at two levels: (1) token-level, where it extends the Earth Mover's Distance (EMD) to preserve ordinal relationships between numerical values, and (2) sequence-level, where it penalizes the overall discrepancy between the predicted and actual sequences.
- Score: 26.759068834681738
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Autoregressive models have become the de facto choice for sequence generation tasks, but standard approaches treat digits as independent tokens and apply cross-entropy loss, overlooking the coherent structure of numerical sequences. This paper introduces Numerical Token Integrity Loss (NTIL) to address this gap. NTIL operates at two levels: (1) token-level, where it extends the Earth Mover's Distance (EMD) to preserve ordinal relationships between numerical values, and (2) sequence-level, where it penalizes the overall discrepancy between the predicted and actual sequences. This dual approach improves numerical prediction and integrates effectively with LLMs/MLLMs. Extensive experiments show significant performance improvements with NTIL.
Related papers
- Similarity-Distance-Magnitude Language Models [0.0]
We introduce Similarity-Distance-Magnitude (SDM) language models (LMs)<n>LMs are sequence prediction models fine-tuned to maximize the proportion of generations in the well-calibrated, high-probability region partitioned by a final-layer SDM activation layer used for binary classification of instruction-following.
arXiv Detail & Related papers (2025-10-30T06:42:15Z) - Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing [4.707859580472452]
Masked diffusion models (MDMs) offer a compelling alternative to autoregressive models (ARMs) for discrete text generation.<n>They enable parallel token sampling, rather than sequential, left-to-right generation.<n>We present PUNT, a model-agnostic sampler that reconciles this trade-off.
arXiv Detail & Related papers (2025-10-24T18:41:26Z) - TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge [59.57934574562651]
TRACT (Two-stage Regression-Aware fine-tuning with CoT) is a method combining CoT reasoning with regression-aware training.<n>Experiments across four LLM-as-a-judge datasets and two LLMs show that TRACT significantly outperforms existing methods.
arXiv Detail & Related papers (2025-03-06T12:33:20Z) - Non-autoregressive Sequence-to-Sequence Vision-Language Models [59.445765313094434]
We propose a parallel decoding sequence-to-sequence vision-language model that marginalizes over multiple inference paths in the decoder.<n>The model achieves performance on-par with its state-of-the-art autoregressive counterpart, but is faster at inference time.
arXiv Detail & Related papers (2024-03-04T17:34:59Z) - Symbolic Autoencoding for Self-Supervised Sequence Learning [24.71036683224435]
$Sigma$AE is a self-supervised framework that harnesses the power of abundant unparallel data alongside limited parallel data.
Our results demonstrate that $Sigma$AE significantly enhances performance on transduction tasks, even with minimal parallel data.
arXiv Detail & Related papers (2024-02-16T11:04:31Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Perception and Semantic Aware Regularization for Sequential Confidence
Calibration [12.265757315192497]
We propose a Perception and Semantic aware Sequence Regularization framework.
We introduce a semantic context-free recognition and a language model to acquire similar sequences with high perceptive similarities and semantic correlation.
Experiments on canonical sequence recognition tasks, including scene text and speech recognition, demonstrate that our method sets novel state-of-the-art results.
arXiv Detail & Related papers (2023-05-31T02:16:29Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.
CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.