Related papers: IRGen: Generative Modeling for Image Retrieval

IRGen: Generative Modeling for Image Retrieval

URL: http://arxiv.org/abs/2303.10126v4
Date: Tue, 23 Jul 2024 23:52:19 GMT
Title: IRGen: Generative Modeling for Image Retrieval
Authors: Yidan Zhang, Ting Zhang, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Jingdong Wang, Baining Guo,
Abstract summary: In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling. We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units. Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
Score: 82.62022344988993
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While generative modeling has become prevalent across numerous research fields, its integration into the realm of image retrieval remains largely unexplored and underjustified. In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling and employing a sequence-to-sequence model. This approach is harmoniously aligned with the current trend towards unification in research, presenting a cohesive framework that allows for end-to-end differentiable searching. This, in turn, facilitates superior performance via direct optimization techniques. The development of our model, dubbed IRGen, addresses the critical technical challenge of converting an image into a concise sequence of semantic units, which is pivotal for enabling efficient and effective search. Extensive experiments demonstrate that our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks as well as two million-scale datasets, yielding significant improvement compared to prior competitive retrieval methods. In addition, the notable surge in precision scores facilitated by generative modeling presents the potential to bypass the reranking phase, which is traditionally indispensable in practical retrieval workflows.

Related papers

Modelship Attribution: Tracing Multi-Stage Manipulations Across Generative Models [37.368187232084324]
"Modelship Attribution" aims to trace the evolution of manipulated images by identifying the generative models involved and reconstructing the sequence of edits they performed.<n>We introduce the modelship attribution transformer (MAT), a framework designed to effectively recognize and attribute the contributions of various models within complex, multi-stage manipulation.
arXiv Detail & Related papers (2025-06-03T03:45:09Z)
Exploring Training and Inference Scaling Laws in Generative Retrieval [50.82554729023865]
We investigate how model size, training data scale, and inference-time compute jointly influence generative retrieval performance. Our experiments show that n-gram-based methods demonstrate strong alignment with both training and inference scaling laws. We find that LLaMA models consistently outperform T5 models, suggesting a particular advantage for larger decoder-only models in generative retrieval.
arXiv Detail & Related papers (2025-03-24T17:59:03Z)
HRR: Hierarchical Retrospection Refinement for Generated Image Detection [16.958383381415445]
We propose a diffusion model-based generative image detection framework termed Hierarchical Retrospection Refinement(HRR) The HRR framework consistently delivers significant performance improvements, outperforming state-of-the-art methods in generated image detection task.
arXiv Detail & Related papers (2025-02-25T05:13:44Z)
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [77.86514804787622]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks. We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation. We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z)
Distillation of Diffusion Features for Semantic Correspondence [23.54555663670558]
We propose a novel knowledge distillation technique to overcome the problem of reduced efficiency. We show how to use two large vision foundation models and distill the capabilities of these complementary models into one smaller model that maintains high accuracy at reduced computational cost. Our empirical results demonstrate that our distilled model with 3D data augmentation achieves performance superior to current state-of-the-art methods while significantly reducing computational load and enhancing practicality for real-world applications, such as semantic video correspondence.
arXiv Detail & Related papers (2024-12-04T17:55:33Z)
A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks. Our approach enables versatile capabilities via different inference-time sampling schemes. Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z)
Fashion Image-to-Image Translation for Complementary Item Retrieval [13.88174783842901]
We introduce the Generative Compatibility Model (GeCo), a two-stage approach that improves fashion image retrieval through paired image-to-image translation. Evaluations on three datasets show that GeCo outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-08-19T09:50:20Z)
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis [22.11487736315616]
Rectified flow is a recent generative model formulation that connects data and noise in a straight line. We improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. We present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities.
arXiv Detail & Related papers (2024-03-05T18:45:39Z)
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z)
Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey [96.99328714941657]
We present a comprehensive review of recent diffusion model-based methods on image restoration. We classify and emphasize the innovative designs using diffusion models for both IR and blind/real-world IR. We propose five potential and challenging directions for the future research of diffusion model-based IR.
arXiv Detail & Related papers (2023-08-18T08:40:38Z)
Unified Framework for Histopathology Image Augmentation and Classification via Generative Models [6.404713841079193]
We propose an innovative unified framework that integrates the data generation and model training stages into a unified process. Our approach utilizes a pure Vision Transformer (ViT)-based conditional Generative Adversarial Network (cGAN) model to simultaneously handle both image synthesis and classification. Our experiments show that our unified synthetic augmentation framework consistently enhances the performance of histopathology image classification models.
arXiv Detail & Related papers (2022-12-20T03:40:44Z)
A Visual Navigation Perspective for Category-Level Object Pose Estimation [41.60364392204057]
This paper studies category-level object pose estimation based on a single monocular image. Recent advances in pose-aware generative models have paved the way for addressing this challenging task using analysis-by-synthesis.
arXiv Detail & Related papers (2022-03-25T10:57:37Z)
A Generic Approach for Enhancing GANs by Regularized Latent Optimization [79.00740660219256]
We introduce a generic framework called em generative-model inference that is capable of enhancing pre-trained GANs effectively and seamlessly. Our basic idea is to efficiently infer the optimal latent distribution for the given requirements using Wasserstein gradient flow techniques.
arXiv Detail & Related papers (2021-12-07T05:22:50Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.