3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI
Image Similarity Challenge
- URL: http://arxiv.org/abs/2112.02373v1
- Date: Sat, 4 Dec 2021 16:25:24 GMT
- Title: 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI
Image Similarity Challenge
- Authors: Xinlong Sun, Yangyang Qin, Xuyuan Xu, Guoping Gong, Yang Fang, Yexin
Wang
- Abstract summary: This paper presents our 3rd place solution to the matching track of Image Similarity Challenge (ISC) 2021 organized by Facebook AI.
We propose a multi-branch retrieval method of combining global descriptors and local descriptors to cover all attack cases.
We show some ablation experiments of our method, which reveals the complementary advantages of global and local features.
- Score: 2.4340897078287815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a basic task of computer vision, image similarity retrieval is facing the
challenge of large-scale data and image copy attacks. This paper presents our
3rd place solution to the matching track of Image Similarity Challenge (ISC)
2021 organized by Facebook AI. We propose a multi-branch retrieval method of
combining global descriptors and local descriptors to cover all attack cases.
Specifically, we attempt many strategies to optimize global descriptors,
including abundant data augmentations, self-supervised learning with a single
Transformer model, overlay detection preprocessing. Moreover, we introduce the
robust SIFT feature and GPU Faiss for local retrieval which makes up for the
shortcomings of the global retrieval. Finally, KNN-matching algorithm is used
to judge the match and merge scores. We show some ablation experiments of our
method, which reveals the complementary advantages of global and local
features.
Related papers
- Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities [88.398085358514]
Contrastive Deepfake Embeddings (CoDE) is a novel embedding space specifically designed for deepfake detection.
CoDE is trained via contrastive learning by additionally enforcing global-local similarities.
arXiv Detail & Related papers (2024-07-29T18:00:10Z) - Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models [44.437693135170576]
We propose a new framework, LMM with Sophisticated Tasks, Local image compression, and Mixture of global Experts (SliME)
We extract contextual information from the global view using a mixture of adapters, based on the observation that different adapters excel at different tasks.
The proposed method achieves leading performance across various benchmarks with only 2 million training data.
arXiv Detail & Related papers (2024-06-12T17:59:49Z) - Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - Coarse-to-Fine: Learning Compact Discriminative Representation for
Single-Stage Image Retrieval [11.696941841000985]
Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications.
We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation.
Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
arXiv Detail & Related papers (2023-08-08T03:06:10Z) - DCN-T: Dual Context Network with Transformer for Hyperspectral Image
Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.
We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images.
Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z) - GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection [29.118321046339656]
We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection.
GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
arXiv Detail & Related papers (2022-11-16T02:03:20Z) - Global-and-Local Collaborative Learning for Co-Salient Object Detection [162.62642867056385]
The goal of co-salient object detection (CoSOD) is to discover salient objects that commonly appear in a query group containing two or more relevant images.
We propose a global-and-local collaborative learning architecture, which includes a global correspondence modeling (GCM) and a local correspondence modeling (LCM)
The proposed GLNet is evaluated on three prevailing CoSOD benchmark datasets, demonstrating that our model trained on a small dataset (about 3k images) still outperforms eleven state-of-the-art competitors trained on some large datasets (about 8k-200k images)
arXiv Detail & Related papers (2022-04-19T14:32:41Z) - Reuse your features: unifying retrieval and feature-metric alignment [3.845387441054033]
DRAN is the first network able to produce the features for the three steps of visual localization.
It achieves competitive performance in terms of robustness and accuracy under challenging conditions in public benchmarks.
arXiv Detail & Related papers (2022-04-13T10:42:00Z) - Global-Local Context Network for Person Search [125.51080862575326]
Person search aims to jointly localize and identify a query person from natural, uncropped images.
We exploit rich context information globally and locally surrounding the target person, which we refer to scene and group context, respectively.
We propose a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement.
arXiv Detail & Related papers (2021-12-05T07:38:53Z) - DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local
and Global Features [42.62089148690047]
We propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval.
It attentively extracts representative local information with multi-atrous convolutions and self-attention at first.
The whole framework is end-to-end differentiable and can be trained with image-level labels.
arXiv Detail & Related papers (2021-08-06T03:14:09Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.