Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models
- URL: http://arxiv.org/abs/2602.10346v1
- Date: Tue, 10 Feb 2026 22:36:48 GMT
- Title: Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models
- Authors: Arash Gholami Davoodi, Navid Rezazadeh, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour,
- Abstract summary: Top-W is a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry.<n>We show that Top-W consistently outperforms prior state-of-the-art decoding approaches achieving up to 33.7% improvement.
- Score: 9.059725329168435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) must balance diversity and creativity against logical coherence in open-ended generation. Existing truncation-based samplers are effective but largely heuristic, relying mainly on probability mass and entropy while ignoring semantic geometry of the token space. We present Top-W, a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry-to keep the cropped distribution close to the original, while explicitly balancing retained probability mass against the entropy of the kept set. Our theory yields a simple closed-form structure for the fixed-potential subset update: depending on the mass-entropy trade-off, the optimal crop either collapses to a single token or takes the form of a one-dimensional prefix that can be found efficiently with a linear scan. We implement Top-W using efficient geometry-based potentials (nearest-set or k-NN) and pair it with an alternating decoding routine that keeps the standard truncation-and-sampling interface unchanged. Extensive experiments on four benchmarks (GSM8K, GPQA, AlpacaEval, and MT-Bench) across three instruction-tuned models show that Top-W consistently outperforms prior state-of-the-art decoding approaches achieving up to 33.7% improvement. Moreover, we find that Top-W not only improves accuracy-focused performance, but also boosts creativity under judge-based open-ended evaluation.
Related papers
- Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z) - Inverting Self-Organizing Maps: A Unified Activation-Based Framework [39.146761527401424]
We show that the activation pattern of a SOM can be inverted to recover the exact input under mild geometric conditions.<n>We introduce the Manifold-Aware Unified SOM Inversion and Control (MUSIC) update rule.<n>We validate the approach using synthetic Gaussian mixtures, the MNIST and the Faces in the Wild dataset.
arXiv Detail & Related papers (2026-01-20T11:02:54Z) - Latent Geometry of Taste: Scalable Low-Rank Matrix Factorization for Recommender Systems [0.0]
This work investigates the latent geometry of user preferences using the MovieLens 32M dataset.<n>We demonstrate that constrained low-rank models significantly outperform higher dimensional counterparts in terms of ranking precision.<n>We validate the system's practical utility in a cold-start scenario, introducing a tunable scoring parameter to manage the trade-off between popularity bias and personalized affinity effectively.
arXiv Detail & Related papers (2026-01-06T23:42:40Z) - Scaling Bidirectional Spans and Span Violations in Attention Mechanism [5.755498052202004]
canonical $O(N2)$ Transformer remains the empirical performance frontier in sequence modeling.<n>We propose an optimization framework that leverages an asymmetric projection to decompose the backward-pass gradients into parallel spans.<n>We demonstrate that selectively scaling these components, focusing primarily on $0th$ order bidirectional parallel spans, yields the most effective learning signal.
arXiv Detail & Related papers (2025-12-15T07:03:24Z) - Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration [52.82397287366076]
All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework.<n>In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying information.<n>Our symmetric design preserves intrinsic degradation signals robustly, rendering simple additive fusion in skip connections.
arXiv Detail & Related papers (2025-12-11T12:20:31Z) - Residual Primitive Fitting of 3D Shapes with SuperFrusta [33.835182155141695]
We introduce a framework for converting 3D shapes into compact and editable assemblies of analytic primitives.<n>Our approach combines two key contributions: a novel primitive, termed SuperFrustum, and an iterative fiting algorithm, Residual Primitive Fitting.
arXiv Detail & Related papers (2025-12-09T23:58:51Z) - Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model [32.831576387973875]
We propose a two-stage deterministic framework for stable, accurate and fine-grained geometric dense prediction.<n>Specifically, in the first stage, the core predictor employs a single-step deterministic formulation with a clean-data objective.<n>In the second stage, the detail sharpener performs a constrained multi-step rectified-flow refinement within the manifold defined by the core predictor.
arXiv Detail & Related papers (2025-11-30T18:57:25Z) - Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes [60.92139345612904]
We present Light-SQ, a novel superquadric-based optimization framework.<n>We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition.<n>Experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics.
arXiv Detail & Related papers (2025-09-29T16:18:32Z) - PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation [70.98107766265636]
This paper takes the geometric attributes of pre-trained weights as a starting point, systematically analyzing three key components: magnitude, absolute angle, and pairwise angular structure.<n>We find that the pairwise angular structure remains stable across diverse corrupted domains and encodes domain-invariant semantic information, suggesting it should be preserved during adaptation.
arXiv Detail & Related papers (2025-06-03T05:18:15Z) - CWF: Consolidating Weak Features in High-quality Mesh Simplification [50.634070540791555]
We propose a smooth functional that simultaneously considers all of these requirements.
The functional comprises a normal anisotropy term and a Centroidal Voronoi Tessellation (CVT) energy term.
arXiv Detail & Related papers (2024-04-24T05:37:17Z) - Canny-VO: Visual Odometry with RGB-D Cameras based on Geometric 3D-2D
Edge Alignment [85.32080531133799]
This paper reviews the classical problem of free-form curve registration and applies it to an efficient RGBD visual odometry system called Canny-VO.
Two replacements for the distance transformation commonly used in edge registration are proposed: Approximate Nearest Neighbour Fields and Oriented Nearest Neighbour Fields.
3D2D edge alignment benefits from these alternative formulations in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2020-12-15T11:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.