Related papers: SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models

SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models

URL: http://arxiv.org/abs/2601.22805v1
Date: Fri, 30 Jan 2026 10:34:07 GMT
Title: SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models
Authors: Pit Neitemeier, Alessio Serra, Jiaze Li, Sascha Wirges, Lukas Balles, Jan Hendrik Metzen,
Abstract summary: We introduce a router-agnostic metric of boundary quality, boundary enrichment B, which measures how strongly chunk starts concentrate on positions with high next-byte surprisal.<n>We propose Sombrero, which steers learning toward predictive difficulty via a confidence-alignment boundary loss and stabilizes learning by applying confidence-off and accuracy-weighted trade smoothing.
Score: 10.547898683606569
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical sequence models replace fixed tokenization with learned segmentations that compress long byte sequences for efficient autoregressive modeling. While recent end-to-end methods can learn meaningful boundaries from the language-modeling objective alone, it remains difficult to quantitatively assess and systematically steer where compute is spent. We introduce a router-agnostic metric of boundary quality, boundary enrichment B, which measures how strongly chunk starts concentrate on positions with high next-byte surprisal. Guided by this metric, we propose Sombrero, which steers boundary placement toward predictive difficulty via a confidence-alignment boundary loss and stabilizes boundary learning by applying confidence-weighted smoothing at the input level rather than on realized chunks. On 1B scale, across UTF-8 corpora covering English and German text as well as code and mathematical content, Sombrero improves the accuracy-efficiency trade-off and yields boundaries that more consistently align compute with hard-to-predict positions.

Related papers

You Can Learn Tokenization End-to-End with Reinforcement Learning [34.662213518530315]
Tokenization is a hardcoded compression step which remains in the training pipeline of Large Language Models (LLMs)<n>We show that these token boundaries can instead be learned using score function estimates, which have tighter theoretical guarantees.<n>We demonstrate that the resultant method outperforms prior proposed straight-through estimates, both qualitatively and quantitatively.
arXiv Detail & Related papers (2026-02-15T00:31:24Z)
BoundMatch: Boundary detection applied to semi-supervised segmentation [12.8995997687175]
Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images.<n>We propose BoundMatch, a novel multi-task SS-SS framework that explicitly integrates semantic boundary detection into a teacher-student consistency regularization pipeline.<n>Our core mechanism, Boundary Consistency Regularized Multi-Task Learning, enforces prediction agreement between teacher and student models on both segmentation masks and detailed semantic boundaries.
arXiv Detail & Related papers (2025-03-30T17:02:26Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
$\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus [66.62731854746856]
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information. In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion-based optimization. We demonstrate that $texttFedBC$ balances the global and local model test accuracy metrics across a suite datasets.
arXiv Detail & Related papers (2022-06-22T02:42:04Z)
Inverse Boundary Value and Optimal Control Problems on Graphs: A Neural and Numerical Synthesis [0.0]
A key piece in the present architecture is our boundary injected message passing neural network. A regularization technique based on graphical distance is introduced that helps with stabilizing the predictions at nodes far from the boundary.
arXiv Detail & Related papers (2022-06-06T21:26:23Z)
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation [51.59290734837372]
We propose a conceptually simple yet effective post-processing refinement framework to improve the boundary quality. The proposed BPR framework yields significant improvements over the Mask R-CNN baseline on Cityscapes benchmark. By applying the BPR framework to the PolyTransform + SegFix baseline, we reached 1st place on the Cityscapes leaderboard.
arXiv Detail & Related papers (2021-04-12T07:10:48Z)
Active Boundary Loss for Semantic Segmentation [58.72057610093194]
This paper proposes a novel active boundary loss for semantic segmentation. It can progressively encourage the alignment between predicted boundaries and ground-truth boundaries during end-to-end training. Experimental results show that training with the active boundary loss can effectively improve the boundary F-score and mean Intersection-over-Union.
arXiv Detail & Related papers (2021-02-04T15:47:54Z)
Think about boundary: Fusing multi-level boundary information for landmark heatmap regression [51.48533538153833]
We study a two-stage but end-to-end approach for exploring the relationship between the facial boundary and landmarks. We get boundary-aware landmark predictions, which consists of two modules: the self-calibrated boundary estimation (SCBE) module and the boundary-aware landmark transform (BALT) module. Our approach outperforms state-of-the-art methods in the literature.
arXiv Detail & Related papers (2020-08-25T10:14:13Z)
CRAUM-Net: Contextual Recursive Attention with Uncertainty Modeling for Salient Object Detection [0.0]
We present a novel framework that integrates multi-scale context aggregation, advanced attention mechanisms, and an uncertainty-aware module for improved SOD performance.<n>Our Adaptive Cross-Scale Context Module effectively fuses features from multiple levels, leveraging Recursive Channel Spatial Attention and Convolutional Block Attention.<n>To train our network robustly, we employ a combination of boundary-sensitive and topology-preserving loss functions, including Boundary IoU, Focal Tversky, and Topological Saliency losses.
arXiv Detail & Related papers (2020-06-04T18:33:59Z)
DeepStrip: High Resolution Boundary Refinement [60.00241966809684]
We propose to convert regions of interest into strip images and compute a boundary prediction in the strip domain. To detect the target boundary, we present a framework with two prediction layers. We enforce a matching consistency and C0 continuity regularization to the network to reduce false alarms.
arXiv Detail & Related papers (2020-03-25T22:44:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.