SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models
- URL: http://arxiv.org/abs/2601.22805v1
- Date: Fri, 30 Jan 2026 10:34:07 GMT
- Title: SOMBRERO: Measuring and Steering Boundary Placement in End-to-End Hierarchical Sequence Models
- Authors: Pit Neitemeier, Alessio Serra, Jiaze Li, Sascha Wirges, Lukas Balles, Jan Hendrik Metzen,
- Abstract summary: We introduce a router-agnostic metric of boundary quality, boundary enrichment B, which measures how strongly chunk starts concentrate on positions with high next-byte surprisal.<n>We propose Sombrero, which steers learning toward predictive difficulty via a confidence-alignment boundary loss and stabilizes learning by applying confidence-off and accuracy-weighted trade smoothing.
- Score: 10.547898683606569
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical sequence models replace fixed tokenization with learned segmentations that compress long byte sequences for efficient autoregressive modeling. While recent end-to-end methods can learn meaningful boundaries from the language-modeling objective alone, it remains difficult to quantitatively assess and systematically steer where compute is spent. We introduce a router-agnostic metric of boundary quality, boundary enrichment B, which measures how strongly chunk starts concentrate on positions with high next-byte surprisal. Guided by this metric, we propose Sombrero, which steers boundary placement toward predictive difficulty via a confidence-alignment boundary loss and stabilizes boundary learning by applying confidence-weighted smoothing at the input level rather than on realized chunks. On 1B scale, across UTF-8 corpora covering English and German text as well as code and mathematical content, Sombrero improves the accuracy-efficiency trade-off and yields boundaries that more consistently align compute with hard-to-predict positions.
Related papers
- You Can Learn Tokenization End-to-End with Reinforcement Learning [34.662213518530315]
Tokenization is a hardcoded compression step which remains in the training pipeline of Large Language Models (LLMs)<n>We show that these token boundaries can instead be learned using score function estimates, which have tighter theoretical guarantees.<n>We demonstrate that the resultant method outperforms prior proposed straight-through estimates, both qualitatively and quantitatively.
arXiv Detail & Related papers (2026-02-15T00:31:24Z) - BoundMatch: Boundary detection applied to semi-supervised segmentation [12.8995997687175]
Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images.<n>We propose BoundMatch, a novel multi-task SS-SS framework that explicitly integrates semantic boundary detection into a teacher-student consistency regularization pipeline.<n>Our core mechanism, Boundary Consistency Regularized Multi-Task Learning, enforces prediction agreement between teacher and student models on both segmentation masks and detailed semantic boundaries.
arXiv Detail & Related papers (2025-03-30T17:02:26Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - $\texttt{FedBC}$: Calibrating Global and Local Models via Federated
Learning Beyond Consensus [66.62731854746856]
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information.
In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion-based optimization.
We demonstrate that $texttFedBC$ balances the global and local model test accuracy metrics across a suite datasets.
arXiv Detail & Related papers (2022-06-22T02:42:04Z) - Inverse Boundary Value and Optimal Control Problems on Graphs: A Neural
and Numerical Synthesis [0.0]
A key piece in the present architecture is our boundary injected message passing neural network.
A regularization technique based on graphical distance is introduced that helps with stabilizing the predictions at nodes far from the boundary.
arXiv Detail & Related papers (2022-06-06T21:26:23Z) - Look Closer to Segment Better: Boundary Patch Refinement for Instance
Segmentation [51.59290734837372]
We propose a conceptually simple yet effective post-processing refinement framework to improve the boundary quality.
The proposed BPR framework yields significant improvements over the Mask R-CNN baseline on Cityscapes benchmark.
By applying the BPR framework to the PolyTransform + SegFix baseline, we reached 1st place on the Cityscapes leaderboard.
arXiv Detail & Related papers (2021-04-12T07:10:48Z) - Active Boundary Loss for Semantic Segmentation [58.72057610093194]
This paper proposes a novel active boundary loss for semantic segmentation.
It can progressively encourage the alignment between predicted boundaries and ground-truth boundaries during end-to-end training.
Experimental results show that training with the active boundary loss can effectively improve the boundary F-score and mean Intersection-over-Union.
arXiv Detail & Related papers (2021-02-04T15:47:54Z) - Think about boundary: Fusing multi-level boundary information for
landmark heatmap regression [51.48533538153833]
We study a two-stage but end-to-end approach for exploring the relationship between the facial boundary and landmarks.
We get boundary-aware landmark predictions, which consists of two modules: the self-calibrated boundary estimation (SCBE) module and the boundary-aware landmark transform (BALT) module.
Our approach outperforms state-of-the-art methods in the literature.
arXiv Detail & Related papers (2020-08-25T10:14:13Z) - CRAUM-Net: Contextual Recursive Attention with Uncertainty Modeling for Salient Object Detection [0.0]
We present a novel framework that integrates multi-scale context aggregation, advanced attention mechanisms, and an uncertainty-aware module for improved SOD performance.<n>Our Adaptive Cross-Scale Context Module effectively fuses features from multiple levels, leveraging Recursive Channel Spatial Attention and Convolutional Block Attention.<n>To train our network robustly, we employ a combination of boundary-sensitive and topology-preserving loss functions, including Boundary IoU, Focal Tversky, and Topological Saliency losses.
arXiv Detail & Related papers (2020-06-04T18:33:59Z) - DeepStrip: High Resolution Boundary Refinement [60.00241966809684]
We propose to convert regions of interest into strip images and compute a boundary prediction in the strip domain.
To detect the target boundary, we present a framework with two prediction layers.
We enforce a matching consistency and C0 continuity regularization to the network to reduce false alarms.
arXiv Detail & Related papers (2020-03-25T22:44:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.