Related papers: AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

URL: http://arxiv.org/abs/2506.06962v3
Date: Sat, 14 Jun 2025 00:40:34 GMT
Title: AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Authors: Jingyuan Qi, Zhiyang Xu, Qifan Wang, Lifu Huang,
Abstract summary: We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level.<n>We validate the effectiveness of AR-RAG on widely adopted benchmarks, including Midjourney-30K, GenEval and DPG-Bench.
Score: 35.008697736838194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level. Unlike prior methods that perform a single, static retrieval before generation and condition the entire generation on fixed reference images, AR-RAG performs context-aware retrievals at each generation step, using prior-generated patches as queries to retrieve and incorporate the most relevant patch-level visual references, enabling the model to respond to evolving generation needs while avoiding limitations (e.g., over-copying, stylistic bias, etc.) prevalent in existing methods. To realize AR-RAG, we propose two parallel frameworks: (1) Distribution-Augmentation in Decoding (DAiD), a training-free plug-and-use decoding strategy that directly merges the distribution of model-predicted patches with the distribution of retrieved patches, and (2) Feature-Augmentation in Decoding (FAiD), a parameter-efficient fine-tuning method that progressively smooths the features of retrieved patches via multi-scale convolution operations and leverages them to augment the image generation process. We validate the effectiveness of AR-RAG on widely adopted benchmarks, including Midjourney-30K, GenEval and DPG-Bench, demonstrating significant performance gains over state-of-the-art image generation models.

Related papers

RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors [54.81109375939306]
RGE-GS is a novel expansive reconstruction framework that synergizes diffusion-based generation with reward-guided Gaussian integration.<n>We propose a reward network that learns to identify and prioritize consistently generated patterns prior to reconstruction phases.<n>During the reconstruction process, we devise a differentiated training strategy that automatically adjust Gaussian optimization progress according to scene converge metrics.
arXiv Detail & Related papers (2025-06-28T08:02:54Z)
ImpRAG: Retrieval-Augmented Generation with Implicit Queries [49.510101132093396]
ImpRAG is a query-free RAG system that integrates retrieval and generation into a unified model.<n>We show that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats.
arXiv Detail & Related papers (2025-06-02T21:38:21Z)
ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL [54.100889131719626]
Chain-of-thought reasoning and reinforcement learning have driven breakthroughs in NLP.<n>We introduce ReasonGen-R1, a framework that imbues an autoregressive image generator with explicit text-based "thinking" skills.<n>We show that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models.
arXiv Detail & Related papers (2025-05-30T17:59:48Z)
Policy Optimized Text-to-Image Pipeline Design [72.87655664038617]
We introduce a novel reinforcement learning-based framework for text-to-image generation.<n>Our approach first trains an ensemble of reward models capable of predicting image quality scores directly from prompt-workflow combinations.<n>We then implement a two-phase training strategy: initial vocabulary training followed by GRPO-based optimization.
arXiv Detail & Related papers (2025-05-27T17:50:47Z)
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration [27.307331773270676]
latent diffusion models (LDMs) have significantly improved the perceptual quality of All-in-One image Restoration (AiOR) methods.<n>These LDM-based frameworks suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications.<n>We propose a novel generative approach for AiOR that significantly outperforms LDM-based models in restoration performance while achieving over $mathbf10times$ faster inference.
arXiv Detail & Related papers (2025-05-23T15:52:26Z)
TensorAR: Refinement is All You Need in Autoregressive Image Generation [45.38495724606076]
Autoregressive (AR) image generators offer a language-model-friendly approach to image generation by predicting discrete image tokens in a causal sequence.<n>Unlike diffusion models, AR models lack a mechanism to refine previous predictions, limiting their generation quality.<n>In this paper, we introduce a new AR paradigm that reformulates image generation from next-token prediction to next-tensor prediction.
arXiv Detail & Related papers (2025-05-22T07:27:25Z)
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps [16.84310001807895]
This paper introduces a model-agnostic approach that can be applied to A-RAG methods.<n>Specifically, we use cache access and parallel generation to speed up the prefilling and decoding stages respectively.
arXiv Detail & Related papers (2025-05-19T05:39:38Z)
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z)
DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction. Few-shot methods often struggle with poor reconstruction quality in vast environments. This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z)
Randomized Autoregressive Visual Generation [26.195148077398223]
This paper presents Randomized AutoRegressive modeling (RAR) for visual generation. RAR sets a new state-of-the-art performance on the image generation task while maintaining full compatibility with language modeling frameworks. On the ImageNet-256 benchmark, RAR achieves an FID score of 1.48, not only surpassing prior state-the-art autoregressive image generators but also outperforming leading diffusion-based and masked transformer-based methods.
arXiv Detail & Related papers (2024-11-01T17:59:58Z)
DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration [5.573836220587265]
This work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR) DRM-IR consists of task-adaptive degradation modeling and model-based image restoring. Experiments on multiple benchmark datasets show that our DRM-IR achieves state-of-the-art in All-In-One IR.
arXiv Detail & Related papers (2023-07-15T02:42:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.