Related papers: Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models

Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models

URL: http://arxiv.org/abs/2602.10346v1
Date: Tue, 10 Feb 2026 22:36:48 GMT
Title: Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models
Authors: Arash Gholami Davoodi, Navid Rezazadeh, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour,
Abstract summary: Top-W is a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry.<n>We show that Top-W consistently outperforms prior state-of-the-art decoding approaches achieving up to 33.7% improvement.
Score: 9.059725329168435
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) must balance diversity and creativity against logical coherence in open-ended generation. Existing truncation-based samplers are effective but largely heuristic, relying mainly on probability mass and entropy while ignoring semantic geometry of the token space. We present Top-W, a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry-to keep the cropped distribution close to the original, while explicitly balancing retained probability mass against the entropy of the kept set. Our theory yields a simple closed-form structure for the fixed-potential subset update: depending on the mass-entropy trade-off, the optimal crop either collapses to a single token or takes the form of a one-dimensional prefix that can be found efficiently with a linear scan. We implement Top-W using efficient geometry-based potentials (nearest-set or k-NN) and pair it with an alternating decoding routine that keeps the standard truncation-and-sampling interface unchanged. Extensive experiments on four benchmarks (GSM8K, GPQA, AlpacaEval, and MT-Bench) across three instruction-tuned models show that Top-W consistently outperforms prior state-of-the-art decoding approaches achieving up to 33.7% improvement. Moreover, we find that Top-W not only improves accuracy-focused performance, but also boosts creativity under judge-based open-ended evaluation.

Related papers

Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z)
Inverting Self-Organizing Maps: A Unified Activation-Based Framework [39.146761527401424]
We show that the activation pattern of a SOM can be inverted to recover the exact input under mild geometric conditions.<n>We introduce the Manifold-Aware Unified SOM Inversion and Control (MUSIC) update rule.<n>We validate the approach using synthetic Gaussian mixtures, the MNIST and the Faces in the Wild dataset.
arXiv Detail & Related papers (2026-01-20T11:02:54Z)
Latent Geometry of Taste: Scalable Low-Rank Matrix Factorization for Recommender Systems [0.0]
This work investigates the latent geometry of user preferences using the MovieLens 32M dataset.<n>We demonstrate that constrained low-rank models significantly outperform higher dimensional counterparts in terms of ranking precision.<n>We validate the system's practical utility in a cold-start scenario, introducing a tunable scoring parameter to manage the trade-off between popularity bias and personalized affinity effectively.
arXiv Detail & Related papers (2026-01-06T23:42:40Z)
Scaling Bidirectional Spans and Span Violations in Attention Mechanism [5.755498052202004]
canonical $O(N2)$ Transformer remains the empirical performance frontier in sequence modeling.<n>We propose an optimization framework that leverages an asymmetric projection to decompose the backward-pass gradients into parallel spans.<n>We demonstrate that selectively scaling these components, focusing primarily on $0th$ order bidirectional parallel spans, yields the most effective learning signal.
arXiv Detail & Related papers (2025-12-15T07:03:24Z)
Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration [52.82397287366076]
All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework.<n>In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying information.<n>Our symmetric design preserves intrinsic degradation signals robustly, rendering simple additive fusion in skip connections.
arXiv Detail & Related papers (2025-12-11T12:20:31Z)
Residual Primitive Fitting of 3D Shapes with SuperFrusta [33.835182155141695]
We introduce a framework for converting 3D shapes into compact and editable assemblies of analytic primitives.<n>Our approach combines two key contributions: a novel primitive, termed SuperFrustum, and an iterative fiting algorithm, Residual Primitive Fitting.
arXiv Detail & Related papers (2025-12-09T23:58:51Z)
Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model [32.831576387973875]
We propose a two-stage deterministic framework for stable, accurate and fine-grained geometric dense prediction.<n>Specifically, in the first stage, the core predictor employs a single-step deterministic formulation with a clean-data objective.<n>In the second stage, the detail sharpener performs a constrained multi-step rectified-flow refinement within the manifold defined by the core predictor.
arXiv Detail & Related papers (2025-11-30T18:57:25Z)
Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes [60.92139345612904]
We present Light-SQ, a novel superquadric-based optimization framework.<n>We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition.<n>Experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics.
arXiv Detail & Related papers (2025-09-29T16:18:32Z)
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z)
CWF: Consolidating Weak Features in High-quality Mesh Simplification [50.634070540791555]
We propose a smooth functional that simultaneously considers all of these requirements. The functional comprises a normal anisotropy term and a Centroidal Voronoi Tessellation (CVT) energy term.
arXiv Detail & Related papers (2024-04-24T05:37:17Z)
Canny-VO: Visual Odometry with RGB-D Cameras based on Geometric 3D-2D Edge Alignment [85.32080531133799]
This paper reviews the classical problem of free-form curve registration and applies it to an efficient RGBD visual odometry system called Canny-VO. Two replacements for the distance transformation commonly used in edge registration are proposed: Approximate Nearest Neighbour Fields and Oriented Nearest Neighbour Fields. 3D2D edge alignment benefits from these alternative formulations in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2020-12-15T11:42:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.