Related papers: TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

URL: http://arxiv.org/abs/2405.16847v2
Date: Mon, 25 Aug 2025 06:29:17 GMT
Title: TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation
Authors: Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu,
Abstract summary: We propose a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives.<n> TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space.<n>We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity.
Score: 65.65530016765615
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neuron segmentation from electron microscopy (EM) volumes is crucial for understanding brain circuits, yet the complex neuronal structures in high-resolution EM images present significant challenges. EM data exhibits unique characteristics including high noise levels, anisotropic voxel dimensions, and ultra-long spatial dependencies that make traditional vision models inadequate. Inspired by autoregressive pretraining in language models, we propose TokenUnify, a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives. TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space with emergent properties. From an information-theoretic perspective, these three tasks are complementary and provide optimal coverage of visual data structure, with our approach reducing autoregressive error accumulation from O(K) to O(sqrt(K)) for sequences of length K. We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity. Leveraging the Mamba architecture's linear-time sequence modeling capabilities, TokenUnify achieves a 44% performance improvement on downstream neuron segmentation and outperforms MAE by 25%. Our approach demonstrates superior scaling properties as model size increases, effectively bridging the gap between pretraining strategies for language and vision models.

Related papers

A Systematic Evaluation of Sample-Level Tokenization Strategies for MEG Foundation Models [6.336623115095147]
Recent success in natural language processing has motivated growing interest in large-scale foundation models for neuroimaging data.<n>The impact of different tokenization strategies for neural data is currently poorly understood.<n>We present a systematic evaluation of sample-level tokenization strategies for transformer-based large neuroimaging models applied to magnetoencephalography (MEG) data.
arXiv Detail & Related papers (2026-02-18T17:21:02Z)
Optimal message passing for molecular prediction is simple, attentive and spatial [0.0]
Strategies to improve the predicting performance of Message-Passing Neural-Networks for molecular property predictions can be achieved.<n>We design models that achieve state-of-the-art performance, surpassing more complex models such as those pre-trained on external databases.
arXiv Detail & Related papers (2025-09-13T15:55:02Z)
Bridging Graph and State-Space Modeling for Intensive Care Unit Length of Stay Prediction [14.897714298219569]
We propose S$2$G-Net, a novel neural architecture that unifies state-space sequence modeling with multi-view Graph Neural Networks (GNNs) for ICU LOS prediction.<n> Experiments on the large-scale MIMIC-IV cohort dataset show that S$2$G-Net consistently outperforms sequence models.
arXiv Detail & Related papers (2025-08-24T23:32:25Z)
SpikeSTAG: Spatial-Temporal Forecasting via GNN-SNN Collaboration [16.754715227269525]
Spiking neural networks (SNNs) offer a distinctive approach for capturing the complexities of temporal data.<n>We introduce a brand new SNN architecture, which is among the first to seamlessly integrate graph structural learning with spike-based temporal processing.<n> Experiments show that our model surpasses the state-of-the-art SNN-based iSpikformer on all datasets.
arXiv Detail & Related papers (2025-08-04T05:17:52Z)
CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z)
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit [45.18582668677648]
We present a training-free method to transplant tokenizers in large language models.<n>We approximate each out-of-vocabulary token as a sparse linear combination of shared tokens.<n>We show that OMP achieves best zero-shot preservation of the base model's performance.
arXiv Detail & Related papers (2025-06-07T00:51:27Z)
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation [63.89280381800457]
We propose TokenBridge, which maintains the strong representation capacity of continuous tokens while preserving the modeling simplicity of discrete tokens. We introduce a dimension-wise quantization strategy that independently discretizes each feature dimension, paired with a lightweight autoregressive prediction mechanism. Our approach achieves reconstruction and generation quality on par with continuous methods while using standard categorical prediction.
arXiv Detail & Related papers (2025-03-20T17:59:59Z)
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition [5.575078692353885]
We propose a new model for multi-token prediction in transformers, aiming to enhance sampling efficiency without compromising accuracy. By generalizing it to a rank-$r$ canonical probability decomposition, we develop an improved model that predicts multiple tokens simultaneously.
arXiv Detail & Related papers (2024-10-23T11:06:36Z)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles [23.134664392314264]
Tokenization is associated with many poorly understood shortcomings in language models (LM) This work studies how tokenization impacts model performance by analyzing and comparing models with their byte-level counterparts. We develop a next-byte sampling algorithm that eliminates tokenization bias without requiring further training or optimization.
arXiv Detail & Related papers (2024-10-11T23:30:42Z)
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation [60.80423207808076]
Capturing long-range dependencies while preserving high-resolution visual representations is crucial for dense prediction tasks such as human pose estimation.<n>We propose the Dynamic Visual State Space (DVSS) block, which augments visual state space models with multi-scale convolutional operations.<n>We build HRVMamba, a novel model for efficient high-resolution representation learning.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
Spatiotemporal Graph Learning with Direct Volumetric Information Passing and Feature Enhancement [62.91536661584656]
We propose a dual-module framework, Cell-embedded and Feature-enhanced Graph Neural Network (aka, CeFeGNN) for learning.<n>We embed learnable cell attributions to the common node-edge message passing process, which better captures the spatial dependency of regional features.<n>Experiments on various PDE systems and one real-world dataset demonstrate that CeFeGNN achieves superior performance compared with other baselines.
arXiv Detail & Related papers (2024-09-26T16:22:08Z)
Semformer: Transformer Language Models with Semantic Planning [18.750863564495006]
Next-token prediction serves as the dominant component in current neural language models. We introduce Semformer, a novel method of training a Transformer language model that explicitly models the semantic planning of response.
arXiv Detail & Related papers (2024-09-17T12:54:34Z)
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens [1.2549198550400134]
Large language models (LLMs) are extensively used, but there are concerns regarding privacy, security, and copyright due to their opaque training data. Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs) We propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification.
arXiv Detail & Related papers (2024-07-30T23:43:59Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes. Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts. We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z)
Predicting Infant Brain Connectivity with Federated Multi-Trajectory GNNs using Scarce Data [54.55126643084341]
Existing deep learning solutions suffer from three major limitations. We introduce FedGmTE-Net++, a federated graph-based multi-trajectory evolution network. Using the power of federation, we aggregate local learnings among diverse hospitals with limited datasets.
arXiv Detail & Related papers (2024-01-01T10:20:01Z)
Data-driven modelling of brain activity using neural networks, Diffusion Maps, and the Koopman operator [0.0]
We propose a machine-learning approach to model long-term out-of-sample dynamics of brain activity from task-dependent fMRI data. We use Diffusion maps (DMs) to discover a set of variables that parametrize the low-dimensional manifold on which the emergent high-dimensional fMRI time series evolve. We construct reduced-order-models (ROMs) on the embedded manifold via two techniques: Feedforward Neural Networks (FNNs) and the Koopman operator.
arXiv Detail & Related papers (2023-04-24T09:08:12Z)
Multi-Level Contrastive Learning for Dense Prediction Task [59.591755258395594]
We present Multi-Level Contrastive Learning for Dense Prediction Task (MCL), an efficient self-supervised method for learning region-level feature representation for dense prediction tasks. Our method is motivated by the three key factors in detection: localization, scale consistency and recognition. Our method consistently outperforms the recent state-of-the-art methods on various datasets with significant margins.
arXiv Detail & Related papers (2023-04-04T17:59:04Z)
Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences. We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM) Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Completely Self-Supervised Crowd Counting via Distribution Matching [92.09218454377395]
We propose a complete self-supervision approach to training models for dense crowd counting. The only input required to train, apart from a large set of unlabeled crowd images, is the approximate upper limit of the crowd count. Our method dwells on the idea that natural crowds follow a power law distribution, which could be leveraged to yield error signals for backpropagation.
arXiv Detail & Related papers (2020-09-14T13:20:12Z)
Conditional Mutual information-based Contrastive Loss for Financial Time Series Forecasting [12.0855096102517]
We present a representation learning framework for financial time series forecasting. In this paper, we propose to first learn compact representations from time series data, then use the learned representations to train a simpler model for predicting time series movements.
arXiv Detail & Related papers (2020-02-18T15:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.