The Collapse of Patches
- URL: http://arxiv.org/abs/2511.22281v1
- Date: Thu, 27 Nov 2025 10:04:44 GMT
- Title: The Collapse of Patches
- Authors: Wei Guo, Shunqi Mao, Zhuonan Liang, Heng Wang, Weidong Cai,
- Abstract summary: patch collapse is analogous to collapsing a particle's wave function in quantum mechanics.<n>To identify which patches are most relied on during a target region's collapse, we learn an autoencoder that softly selects a subset of patches to reconstruct each target patch.<n>We show that respecting this order PageRank benefits various masked image modeling methods.
- Score: 15.500261107186441
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Observing certain patches in an image reduces the uncertainty of others. Their realization lowers the distribution entropy of each remaining patch feature, analogous to collapsing a particle's wave function in quantum mechanics. This phenomenon can intuitively be called patch collapse. To identify which patches are most relied on during a target region's collapse, we learn an autoencoder that softly selects a subset of patches to reconstruct each target patch. Graphing these learned dependencies for each patch's PageRank score reveals the optimal patch order to realize an image. We show that respecting this order benefits various masked image modeling methods. First, autoregressive image generation can be boosted by retraining the state-of-the-art model MAR. Next, we introduce a new setup for image classification by exposing Vision Transformers only to high-rank patches in the collapse order. Seeing 22\% of such patches is sufficient to achieve high accuracy. With these experiments, we propose patch collapse as a novel image modeling perspective that promotes vision efficiency. Our project is available at https://github.com/wguo-ai/CoP .
Related papers
- REOrdering Patches Improves Vision Models [58.8295093799148]
We show that patch order significantly affects model performance in such settings.<n>We propose REOrder, a framework for discovering task-optimal patch orderings.<n>ReOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3.01% and Functional Map of the World by 13.35%.
arXiv Detail & Related papers (2025-05-29T17:59:30Z) - Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More [34.12661784331014]
We study the information loss caused by patchification-based compressive encoding paradigm.<n>We conduct extensive patch size scaling experiments and excitedly observe an intriguing scaling law in patchification.<n>As a by-product, we discover that with smaller patches, task-specific decoder heads become less critical for dense prediction.
arXiv Detail & Related papers (2025-02-06T03:01:38Z) - Next Patch Prediction for Autoregressive Visual Generation [58.73461205369825]
We extend the Next Token Prediction (NTP) paradigm to a novel Next Patch Prediction (NPP) paradigm.<n>Our key idea is to group and aggregate image tokens into patch tokens with higher information density.<n>We show that NPP could reduce the training cost to around 0.6 times while improving image generation quality by up to 1.0 FID score on the ImageNet 256x256 generation benchmark.
arXiv Detail & Related papers (2024-12-19T18:59:36Z) - Learning to Rank Patches for Unbiased Image Redundancy Reduction [80.93989115541966]
Images suffer from heavy spatial redundancy because pixels in neighboring regions are spatially correlated.
Existing approaches strive to overcome this limitation by reducing less meaningful image regions.
We propose a self-supervised framework for image redundancy reduction called Learning to Rank Patches.
arXiv Detail & Related papers (2024-03-31T13:12:41Z) - Learning to Embed Time Series Patches Independently [5.752266579415516]
Masked time series modeling has recently gained much attention as a self-supervised representation learning strategy for time series.
We argue that capturing such patch might not be an optimal strategy for time series representation learning.
We propose to use 1) the simple patch reconstruction task, which autoencode each patch without looking at other patches, and 2) the simple patch-wise reconstruction that embeds each patch independently.
arXiv Detail & Related papers (2023-12-27T06:23:29Z) - Learning to Represent Patches [7.073203009308308]
We introduce a novel method, Patcherizer, to bridge the gap between deep learning for patch representation and semantic intent.
Patcherizer employs graph convolutional neural networks for structural intention graph representation and transformers for intention sequence representation.
Our experiments demonstrate the representation's efficacy across all tasks, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2023-08-31T09:34:38Z) - PATS: Patch Area Transportation with Subdivision for Local Feature
Matching [78.67559513308787]
Local feature matching aims at establishing sparse correspondences between a pair of images.
We propose Patch Area Transportation with Subdivision (PATS) to tackle this issue.
PATS improves both matching accuracy and coverage, and shows superior performance in downstream tasks.
arXiv Detail & Related papers (2023-03-14T08:28:36Z) - Generating natural images with direct Patch Distributions Matching [7.99536002595393]
We develop an algorithm that explicitly and efficiently minimizes the distance between patch distributions in two images.
Our results are often superior to single-image-GANs, require no training, and can generate high quality images in a few seconds.
arXiv Detail & Related papers (2022-03-22T16:38:52Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - SimPatch: A Nearest Neighbor Similarity Match between Image Patches [0.0]
We try to use large patches instead of relatively small patches so that each patch contains more information.
We use different feature extraction mechanisms to extract the features of each individual image patches which forms a feature matrix.
The nearest patches are calculated using two different nearest neighbor algorithms in this paper for a query patch for a given image.
arXiv Detail & Related papers (2020-08-07T10:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.