AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
- URL: http://arxiv.org/abs/2506.06962v3
- Date: Sat, 14 Jun 2025 00:40:34 GMT
- Title: AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
- Authors: Jingyuan Qi, Zhiyang Xu, Qifan Wang, Lifu Huang,
- Abstract summary: We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level.<n>We validate the effectiveness of AR-RAG on widely adopted benchmarks, including Midjourney-30K, GenEval and DPG-Bench.
- Score: 35.008697736838194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level. Unlike prior methods that perform a single, static retrieval before generation and condition the entire generation on fixed reference images, AR-RAG performs context-aware retrievals at each generation step, using prior-generated patches as queries to retrieve and incorporate the most relevant patch-level visual references, enabling the model to respond to evolving generation needs while avoiding limitations (e.g., over-copying, stylistic bias, etc.) prevalent in existing methods. To realize AR-RAG, we propose two parallel frameworks: (1) Distribution-Augmentation in Decoding (DAiD), a training-free plug-and-use decoding strategy that directly merges the distribution of model-predicted patches with the distribution of retrieved patches, and (2) Feature-Augmentation in Decoding (FAiD), a parameter-efficient fine-tuning method that progressively smooths the features of retrieved patches via multi-scale convolution operations and leverages them to augment the image generation process. We validate the effectiveness of AR-RAG on widely adopted benchmarks, including Midjourney-30K, GenEval and DPG-Bench, demonstrating significant performance gains over state-of-the-art image generation models.
Related papers
- RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors [54.81109375939306]
RGE-GS is a novel expansive reconstruction framework that synergizes diffusion-based generation with reward-guided Gaussian integration.<n>We propose a reward network that learns to identify and prioritize consistently generated patterns prior to reconstruction phases.<n>During the reconstruction process, we devise a differentiated training strategy that automatically adjust Gaussian optimization progress according to scene converge metrics.
arXiv Detail & Related papers (2025-06-28T08:02:54Z) - ImpRAG: Retrieval-Augmented Generation with Implicit Queries [49.510101132093396]
ImpRAG is a query-free RAG system that integrates retrieval and generation into a unified model.<n>We show that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats.
arXiv Detail & Related papers (2025-06-02T21:38:21Z) - ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL [54.100889131719626]
Chain-of-thought reasoning and reinforcement learning have driven breakthroughs in NLP.<n>We introduce ReasonGen-R1, a framework that imbues an autoregressive image generator with explicit text-based "thinking" skills.<n>We show that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models.
arXiv Detail & Related papers (2025-05-30T17:59:48Z) - Policy Optimized Text-to-Image Pipeline Design [72.87655664038617]
We introduce a novel reinforcement learning-based framework for text-to-image generation.<n>Our approach first trains an ensemble of reward models capable of predicting image quality scores directly from prompt-workflow combinations.<n>We then implement a two-phase training strategy: initial vocabulary training followed by GRPO-based optimization.
arXiv Detail & Related papers (2025-05-27T17:50:47Z) - RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration [27.307331773270676]
latent diffusion models (LDMs) have significantly improved the perceptual quality of All-in-One image Restoration (AiOR) methods.<n>These LDM-based frameworks suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications.<n>We propose a novel generative approach for AiOR that significantly outperforms LDM-based models in restoration performance while achieving over $mathbf10times$ faster inference.
arXiv Detail & Related papers (2025-05-23T15:52:26Z) - TensorAR: Refinement is All You Need in Autoregressive Image Generation [45.38495724606076]
Autoregressive (AR) image generators offer a language-model-friendly approach to image generation by predicting discrete image tokens in a causal sequence.<n>Unlike diffusion models, AR models lack a mechanism to refine previous predictions, limiting their generation quality.<n>In this paper, we introduce a new AR paradigm that reformulates image generation from next-token prediction to next-tensor prediction.
arXiv Detail & Related papers (2025-05-22T07:27:25Z) - Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps [16.84310001807895]
This paper introduces a model-agnostic approach that can be applied to A-RAG methods.<n>Specifically, we use cache access and parallel generation to speed up the prefilling and decoding stages respectively.
arXiv Detail & Related papers (2025-05-19T05:39:38Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction.
Few-shot methods often struggle with poor reconstruction quality in vast environments.
This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z) - Randomized Autoregressive Visual Generation [26.195148077398223]
This paper presents Randomized AutoRegressive modeling (RAR) for visual generation.
RAR sets a new state-of-the-art performance on the image generation task while maintaining full compatibility with language modeling frameworks.
On the ImageNet-256 benchmark, RAR achieves an FID score of 1.48, not only surpassing prior state-the-art autoregressive image generators but also outperforming leading diffusion-based and masked transformer-based methods.
arXiv Detail & Related papers (2024-11-01T17:59:58Z) - DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image
Restoration [5.573836220587265]
This work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR)
DRM-IR consists of task-adaptive degradation modeling and model-based image restoring.
Experiments on multiple benchmark datasets show that our DRM-IR achieves state-of-the-art in All-In-One IR.
arXiv Detail & Related papers (2023-07-15T02:42:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.