Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images
- URL: http://arxiv.org/abs/2311.14084v4
- Date: Mon, 27 May 2024 03:53:05 GMT
- Title: Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images
- Authors: Shicheng Xu, Danyang Hou, Liang Pang, Jingcheng Deng, Jun Xu, Huawei Shen, Xueqi Cheng,
- Abstract summary: We show that AI-generated images introduce an invisible relevance bias to text-image retrieval models.
The inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias.
We propose an effective training method aimed at alleviating the invisible relevance bias.
- Score: 67.18010640829682
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon causes source bias in text retrieval for web search. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.
Related papers
- A Sanity Check for AI-generated Image Detection [49.08585395873425]
We present a sanity check on whether the task of AI-generated image detection has been solved.
To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset.
We propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns.
arXiv Detail & Related papers (2024-06-27T17:59:49Z) - RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection [60.960988614701414]
RIGID is a training-free and model-agnostic method for robust AI-generated image detection.
RIGID significantly outperforms existing trainingbased and training-free detectors.
arXiv Detail & Related papers (2024-05-30T14:49:54Z) - Would Deep Generative Models Amplify Bias in Future Models? [29.918422914275226]
We investigate the impact of deep generative models on potential social biases in upcoming computer vision models.
We conduct simulations by substituting original images in COCO and CC3M datasets with images generated through Stable Diffusion.
Contrary to expectations, our findings indicate that introducing generated images during training does not uniformly amplify bias.
arXiv Detail & Related papers (2024-04-04T06:58:39Z) - AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - CIFAKE: Image Classification and Explainable Identification of
AI-Generated Synthetic Images [7.868449549351487]
This article proposes to enhance our ability to recognise AI-generated images through computer vision.
The two sets of data present as a binary classification problem with regard to whether the photograph is real or generated by AI.
This study proposes the use of a Convolutional Neural Network (CNN) to classify the images into two categories; Real or Fake.
arXiv Detail & Related papers (2023-03-24T16:33:06Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - Where Does the Performance Improvement Come From? - A Reproducibility
Concern about Image-Text Retrieval [85.03655458677295]
Image-text retrieval has gradually become a major research direction in the field of information retrieval.
We first examine the related concerns and why the focus is on image-text retrieval tasks.
We analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models.
arXiv Detail & Related papers (2022-03-08T05:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.