Related papers: BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

URL: http://arxiv.org/abs/2506.06072v2
Date: Tue, 10 Jun 2025 15:36:25 GMT
Title: BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Authors: Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov,
Abstract summary: We present the B-spline Encoded Action Sequence Tokenizer (BEAST)<n>BEAST encodes action sequences into compact discrete or continuous tokens using B-splines.<n>We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks.
Score: 20.58336395243977
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST's compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.

Related papers

Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit [45.18582668677648]
We present a training-free method to transplant tokenizers in large language models.<n>We approximate each out-of-vocabulary token as a sparse linear combination of shared tokens.<n>We show that OMP achieves best zero-shot preservation of the base model's performance.
arXiv Detail & Related papers (2025-06-07T00:51:27Z)
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model [59.065471969232284]
We propose a novel Aligned Tokenizer (AliTok) to align the tokenizer and autoregressive model.<n>On ImageNet-256 benchmark, using a standard decoder-only autoregressive model as the generator, AliTok achieves a gFID score of 1.50 and an IS of 305.9.<n>When the parameter count is increased to 662M, AliTok achieves a gFID score of 1.35, surpassing the state-of-the-art diffusion method with 10x faster sampling speed.
arXiv Detail & Related papers (2025-06-05T17:45:10Z)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding [50.9948753314669]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z)
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation [63.89280381800457]
We propose TokenBridge, which maintains the strong representation capacity of continuous tokens while preserving the modeling simplicity of discrete tokens.<n>We introduce a dimension-wise quantization strategy that independently discretizes each feature dimension, paired with a lightweight autoregressive prediction mechanism.<n>Our approach achieves reconstruction and generation quality on par with continuous methods while using standard categorical prediction.
arXiv Detail & Related papers (2025-03-20T17:59:59Z)
BRIDLE: Generalized Self-supervised Learning with Quantization [15.121857164574704]
Self-supervised learning has been a powerful approach for learning meaningful representations from unlabeled data across various domains.<n>Inspired by BERT's success in capturing deep bidirectional contexts in natural language processing, similar frameworks have been adapted to other modalities such as audio.<n>We introduce BRIDLE, a self-supervised pretraining framework that incorporates residual quantization into the bidirectional training process.
arXiv Detail & Related papers (2025-02-04T08:54:06Z)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles [23.134664392314264]
Tokenization is associated with many poorly understood shortcomings in language models (LMs)<n>This work studies how tokenization impacts model performance by analyzing and comparing models with their byte-level counterparts.<n>We introduce the Byte-Token Representation Lemma, a framework that establishes a mapping between the learned token distribution and its equivalent byte-level distribution.
arXiv Detail & Related papers (2024-10-11T23:30:42Z)
SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens [4.5888031410244885]
We propose an acceleration scheme for large language models (LLMs) through Speculative Decoding with Semantic Adaptive Tokens (SDSAT) The primary objective of this design is to enhance the LLM model's ability to generate draft tokens more accurately without compromising its accuracy. Experiments conducted on the CodeLlama-13B and 7B models have yielded speed increases of over 3.5X and 3.0X, respectively.
arXiv Detail & Related papers (2024-03-27T14:54:27Z)
BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization [135.73436686653315]
We are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition( SLR) model. Considering the dominance of hand and body in sign language expression, we organize them as pose triplet units and feed them into the Transformer backbone. Pre-training is performed via reconstructing the masked triplet unit from the corrupted input sequence. It adaptively extracts the discrete pseudo label from the pose triplet unit, which represents the semantic gesture/body state.
arXiv Detail & Related papers (2023-02-10T06:23:44Z)
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer. It accurately predicts the number of output tokens and extract hidden variables. It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z)
Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings. CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z)
Fast Interleaved Bidirectional Sequence Generation [90.58793284654692]
We introduce a decoder that generates target words from the left-to-right and right-to-left directions simultaneously. We show that we can easily convert a standard architecture for unidirectional decoding into a bidirectional decoder. Our interleaved bidirectional decoder (IBDecoder) retains the model simplicity and training efficiency of the standard Transformer.
arXiv Detail & Related papers (2020-10-27T17:38:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.