Partition Generative Modeling: Masked Modeling Without Masks
- URL: http://arxiv.org/abs/2505.18883v2
- Date: Thu, 09 Oct 2025 20:59:57 GMT
- Title: Partition Generative Modeling: Masked Modeling Without Masks
- Authors: Justin Deschenaux, Lan Tran, Caglar Gulcehre,
- Abstract summary: Masked generative models (MGMs) are widely used to capture complex data and enable faster generation than autoregressive models (AR)<n>In this work, we introduce the Partition Generative Model (PGM), a novel approach that combines the strengths of AR and MGMs.<n>On OpenWebText, PGMs offer at least $5times$ improvements in sampling latency and throughput, while producing samples with superior Generative Perplexity.
- Score: 10.751153162476726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked generative models (MGMs) are widely used to capture complex data and enable faster generation than autoregressive models (AR) through parallel decoding. However, MGMs typically operate on fixed-length inputs, which can be inefficient: early in sampling, most tokens are masked and carry no information, leading to wasted computation. In contrast, AR models process only tokens generated previously, making early iterations faster. In this work, we introduce the Partition Generative Model (PGM), a novel approach that combines the strengths of AR and MGMs. Rather than masking, PGM partitions tokens into two groups and employs sparse attention to block information flow between them. Since there is no information flow between partitions, the model can process the previously-generated tokens only during sampling, while retaining the ability to generate tokens in parallel and in any order. On OpenWebText, PGMs offer at least $5\times$ improvements in sampling latency and throughput, while producing samples with superior Generative Perplexity, compared to Masked Diffusion Language Models. On ImageNet, PGMs achieve a $7.5\times$ higher throughput than MaskGIT, with only a slight increase in FID (5.54 vs. 5.35). With twice as many sampling steps, the FID reduces to 4.56 while while being $3.9\times$ faster than MaskGIT. Finally, PGMs integrate seamlessly with MGM distillation, providing further inference speedups.
Related papers
- Accelerating Diffusion LLMs via Adaptive Parallel Decoding [60.407727995313074]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z) - Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking [17.511240770486452]
Masked diffusion models (MDMs) have shown competitive performance compared to autoregressive models (ARMs) for language modeling.<n>We introduce EB-Sampler, a drop-in replacement for existing samplers, utilizing an Entropy Bounded unmasking procedure.<n> EB-Sampler accelerates sampling from current state of the art MDMs by roughly 2-3x on standard coding and math reasoning benchmarks without loss in performance.
arXiv Detail & Related papers (2025-05-30T17:52:55Z) - Plug-and-Play Context Feature Reuse for Efficient Masked Generation [36.563229330549284]
Masked generative models (MGMs) have emerged as a powerful framework for image synthesis.<n>We introduce ReCAP (Reused Context-Aware Prediction), a plug-and-play module that accelerates inference in MGMs.
arXiv Detail & Related papers (2025-05-25T10:57:35Z) - Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking [17.371579113481644]
Masked diffusion models (MDM) are powerful generative models for discrete data that generate samples by progressively unmasking tokens in a sequence.<n>We propose the Partial masking scheme (Prime), which augments MDM by allowing tokens to take intermediate states between the masked and unmasked states.<n>Our method demonstrates superior performance across a diverse set of generative modeling tasks.
arXiv Detail & Related papers (2025-05-24T04:16:40Z) - Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions [41.45689715854447]
We introduce Insertion Language Models (ILMs), which learn to insert tokens at arbitrary positions in a sequence.<n>ILMs can represent strong dependencies between tokens, and their ability to generate sequences in arbitrary order allows them to accurately model sequences.
arXiv Detail & Related papers (2025-05-09T03:29:15Z) - Enhancing DNA Foundation Models to Address Masking Inefficiencies [18.54660252939211]
We propose a modified encoder-decoder architecture based on the masked autoencoder framework.<n>We evaluate our approach on the BIOSCAN-5M dataset, comprising over 2 million unique DNA barcodes.
arXiv Detail & Related papers (2025-02-25T17:56:25Z) - FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step.
We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z) - AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration [0.3626013617212667]
We introduce AMUSD (Asynchronous Multi-device Speculative Decoding), a system that accelerates generation by decoupling the draft and verify phases.
Unlike conventional speculative decoding, where only one model (draft or verify) performs token generation at a time, AMUSD enables both models to perform predictions independently on separate devices.
We evaluate our approach over multiple datasets and show that AMUSD achieves an average 29% improvement over speculative decoding and up to 1.96$times$ speedup over conventional autoregressive decoding.
arXiv Detail & Related papers (2024-10-22T19:15:35Z) - Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Masked Audio Generation using a Single Non-Autoregressive Transformer [90.11646612273965]
MAGNeT is a masked generative sequence modeling method that operates directly over several streams of audio tokens.
We demonstrate the efficiency of MAGNeT for the task of text-to-music and text-to-audio generation.
We shed light on the importance of each of the components comprising MAGNeT, together with pointing to the trade-offs between autoregressive and non-autoregressive modeling.
arXiv Detail & Related papers (2024-01-09T14:29:39Z) - Representation Deficiency in Masked Language Modeling [107.39136254013042]
We propose MAE-LM, which pretrains the Masked Autoencoder architecture with where $tt[MASK]$ tokens are excluded from the encoder.
We show that MAE-LM consistently outperforms pretrained models across different pretraining settings and model sizes when fine-tuned on the GLUE and SQuAD benchmarks.
arXiv Detail & Related papers (2023-02-04T01:54:17Z) - Masked Vector Quantization [19.858406923144404]
Generative models with discrete latent representations have recently demonstrated an impressive ability to learn complex data distributions.
We propose the Masked Vector Quantization (MVQ) framework which increases the representational capacity of each code vector by learning mask configurations.
MVQ reduces FID in existing vector quantization architectures by up to $68%$ at 2 tokens per instance and $57%$ at 5 tokens.
arXiv Detail & Related papers (2023-01-16T22:30:53Z) - Fast Inference from Transformers via Speculative Decoding [3.950600027250452]
Inference from large autoregressive models like Transformers is slow - decoding K tokens takes K serial runs of the model.
In this work we introduce speculative decoding - an algorithm to sample from autoregressive models faster without any changes to the outputs, by computing several tokens in parallel.
arXiv Detail & Related papers (2022-11-30T17:33:28Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Paraformer: Fast and Accurate Parallel Transformer for
Non-autoregressive End-to-End Speech Recognition [62.83832841523525]
We propose a fast and accurate parallel transformer, termed Paraformer.
It accurately predicts the number of output tokens and extract hidden variables.
It can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.
arXiv Detail & Related papers (2022-06-16T17:24:14Z) - Extreme Masking for Learning Instance and Distributed Visual
Representations [50.152264456036114]
The paper presents a scalable approach for learning distributed representations over individual tokens and a holistic instance representation simultaneously.
We use self-attention blocks to represent distributed tokens, followed by cross-attention blocks to aggregate the holistic instance.
Our model, named ExtreMA, follows the plain BYOL approach where the instance representation from the unmasked subset is trained to predict that from the intact input.
arXiv Detail & Related papers (2022-06-09T17:59:43Z) - Pre-training Co-evolutionary Protein Representation via A Pairwise
Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM)
Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.