BSAT: B-Spline Adaptive Tokenizer for Long-Term Time Series Forecasting
- URL: http://arxiv.org/abs/2601.00698v1
- Date: Fri, 02 Jan 2026 14:27:54 GMT
- Title: BSAT: B-Spline Adaptive Tokenizer for Long-Term Time Series Forecasting
- Authors: Maximilian Reinwardt, Michael Eichelbeck, Matthias Althoff,
- Abstract summary: Long-term time series forecasting using transformers is hampered by the quadratic complexity of self-attention and the rigidity of uniform patching.<n>We introduce the textitB-Spline Adaptive Tokenizer (BSAT), a novel, parameter-free method that adaptively segments a time series by fitting it with B-splines.
- Score: 11.851496302082722
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Long-term time series forecasting using transformers is hampered by the quadratic complexity of self-attention and the rigidity of uniform patching, which may be misaligned with the data's semantic structure. In this paper, we introduce the \textit{B-Spline Adaptive Tokenizer (BSAT)}, a novel, parameter-free method that adaptively segments a time series by fitting it with B-splines. BSAT algorithmically places tokens in high-curvature regions and represents each variable-length basis function as a fixed-size token, composed of its coefficient and position. Further, we propose a hybrid positional encoding that combines a additive learnable positional encoding with Rotary Positional Embedding featuring a layer-wise learnable base: L-RoPE. This allows each layer to attend to different temporal dependencies. Our experiments on several public benchmarks show that our model is competitive with strong performance at high compression rates. This makes it particularly well-suited for use cases with strong memory constraints.
Related papers
- KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem [12.668341559890605]
We propose KnapSpec, a training-free framework that reformulates draft model selection as a knapsack problem to maximize tokens-per-time throughput.<n>We provide the first rigorous theoretical analysis establishing cosine similarity between hidden states as a mathematically sound proxy for the token acceptance rate.<n>Our experiments on Qwen3 and Llama3 demonstrate that KnapSpec consistently outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-23T08:13:03Z) - EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting [50.794700596484894]
We propose EntroPE (Entropy-Guided Dynamic Patch), a novel, temporally informed framework that dynamically detects transition points via conditional entropy.<n>This preserves temporal structure while retaining the computational benefits of patching.<n> Experiments across long-term forecasting benchmarks demonstrate that EntroPE improves both accuracy and efficiency.
arXiv Detail & Related papers (2025-09-30T12:09:56Z) - Kairos: Towards Adaptive and Generalizable Time Series Foundation Models [27.076542021368056]
Time series foundation models (TSFMs) have emerged as a powerful paradigm for time series analysis.<n>We propose Kairos, a flexible TSFM framework that integrates a dynamic patching tokenizer and an instance-adaptive positional embedding.<n>Kairos achieves superior performance with much fewer parameters on two common zero-shot benchmarks.
arXiv Detail & Related papers (2025-09-30T06:02:26Z) - Set Block Decoding is a Language Model Inference Accelerator [48.061016901663386]
We introduce Set Block Decoding (SBD), a simple and flexible paradigm that accelerates generation by integrating standard next token prediction (NTP) and masked token prediction (MATP) within a single architecture.<n>SBD allows the model to sample multiple, not necessarily consecutive, future tokens in parallel, a key distinction from previous acceleration methods.<n>We demonstrate that SBD enables a 3-5x reduction in the number of forward passes required for generation while achieving same performance as equivalent NTP training.
arXiv Detail & Related papers (2025-09-04T13:02:39Z) - SeqPE: Transformer with Sequential Position Encoding [76.22159277300891]
SeqPE represents each $n$-dimensional position index as a symbolic sequence and employs a lightweight sequential position encoder to learn their embeddings.<n> Experiments across language modeling, long-context question answering, and 2D image classification demonstrate that SeqPE not only surpasses strong baselines in perplexity, exact match (EM) and accuracy--but also enables seamless generalization to multi-dimensional inputs without requiring manual architectural redesign.
arXiv Detail & Related papers (2025-06-16T09:16:40Z) - Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding [89.52931576290976]
We present contextbfTextualized equivaritextbfAnt textbfPosition textbfEncoding (textbfTAPE), a novel framework that enhances positional embeddings by incorporating sequence content across layers.<n>Our method can be easily integrated into pre-trained transformers, offering parameter-efficient fine-tuning with minimal overhead.
arXiv Detail & Related papers (2025-01-01T03:23:00Z) - When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training [51.23520027773028]
Extending context window sizes allows large language models to process longer sequences and handle more complex tasks.
We observe that using RoPE with BFloat16 format results in numerical issues, causing it to deviate from its intended relative positional encoding.
We develop AnchorAttention, a plug-and-play attention method that alleviates numerical issues caused by BFloat16.
arXiv Detail & Related papers (2024-11-20T17:22:31Z) - Factorizers for Distributed Sparse Block Codes [45.29870215671697]
We propose a fast and highly accurate method for factorizing distributed block codes (SBCs)
Our iterative factorizer introduces a threshold-based nonlinear activation, conditional random sampling, and an $ell_infty$-based similarity metric.
We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets.
arXiv Detail & Related papers (2023-03-24T12:31:48Z) - Dissecting Transformer Length Extrapolation via the Lens of Receptive
Field Analysis [72.71398034617607]
We dissect a relative positional embedding design, ALiBi, via the lens of receptive field analysis.
We modify the vanilla Sinusoidal positional embedding to create bftext, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence.
arXiv Detail & Related papers (2022-12-20T15:40:17Z) - Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
Selection [23.39962989492527]
Transformer-based language models such as BERT have achieved the state-of-the-art on various NLP tasks, but are computationally prohibitive.
We present Pyramid-BERT where we replace previously useds with a em core-set based token selection method justified by theoretical results.
The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths.
arXiv Detail & Related papers (2022-03-27T19:52:01Z) - Conformer-Kernel with Query Term Independence for Document Retrieval [32.36908635150144]
The Transformer- Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark.
We extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption.
We show that the Conformer's GPU memory requirement scales linearly with input sequence length, making it a more viable option when ranking long documents.
arXiv Detail & Related papers (2020-07-20T19:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.