Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge
for Generic Image Representations
- URL: http://arxiv.org/abs/2309.01858v1
- Date: Mon, 4 Sep 2023 23:18:38 GMT
- Title: Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge
for Generic Image Representations
- Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, Bingyi Cao, M\'ario
Lipovsk\'y, Pelin Dogan-Sch\"onberger, Grzegorz Makosa, Boris Bluntschli,
Mojtaba Seyedhosseini, Ond\v{r}ej Chum, Andr\'e Araujo
- Abstract summary: We address the problem of universal image embedding, where a single universal model is trained and used in multiple domains.
First, we leverage existing domain-specific datasets to carefully construct a new large-scale public benchmark for the evaluation of universal image embeddings.
Second, we provide a comprehensive experimental evaluation on the new dataset, demonstrating that existing approaches and simplistic extensions lead to worse performance than an assembly of models trained for each domain separately.
- Score: 4.606379774346321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-grained and instance-level recognition methods are commonly trained and
evaluated on specific domains, in a model per domain scenario. Such an
approach, however, is impractical in real large-scale applications. In this
work, we address the problem of universal image embedding, where a single
universal model is trained and used in multiple domains. First, we leverage
existing domain-specific datasets to carefully construct a new large-scale
public benchmark for the evaluation of universal image embeddings, with 241k
query images, 1.4M index images and 2.8M training images across 8 different
domains and 349k classes. We define suitable metrics, training and evaluation
protocols to foster future research in this area. Second, we provide a
comprehensive experimental evaluation on the new dataset, demonstrating that
existing approaches and simplistic extensions lead to worse performance than an
assembly of models trained for each domain separately. Finally, we conducted a
public research competition on this topic, leveraging industrial datasets,
which attracted the participation of more than 1k teams worldwide. This
exercise generated many interesting research ideas and findings which we
present in detail. Project webpage: https://cmp.felk.cvut.cz/univ_emb/
Related papers
- WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation.
We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart.
We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z) - Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images [63.58800688320182]
Domain Generalization is a challenging task in machine learning.
Current methodology lacks quantitative understanding about shifts in stylistic domain.
We introduce a new DG paradigm to address these risks.
arXiv Detail & Related papers (2024-05-24T22:13:31Z) - FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding [7.272083488859574]
We introduce a new dataset for benchmarking visual search methods on flat images with diverse patterns.
Our flat object retrieval benchmark (FORB) supplements the commonly adopted 3D object domain.
It serves as a testbed for assessing the image embedding quality on out-of-distribution domains.
arXiv Detail & Related papers (2023-09-28T08:41:51Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning [82.75157675790553]
We propose a Few-shot guided Prototypical (FoPro) representation learning method.
FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets.
Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets.
arXiv Detail & Related papers (2022-12-01T12:39:03Z) - Using Language to Extend to Unseen Domains [81.37175826824625]
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
We consider how simply verbalizing the training domain as well as domains we want to extend to but do not have data for can improve robustness.
Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain.
arXiv Detail & Related papers (2022-10-18T01:14:02Z) - Current Trends in Deep Learning for Earth Observation: An Open-source
Benchmark Arena for Image Classification [7.511257876007757]
'AiTLAS: Benchmark Arena' is an open-source benchmark framework for evaluating state-of-the-art deep learning approaches for image classification.
We present a comprehensive comparative analysis of more than 400 models derived from nine different state-of-the-art architectures.
arXiv Detail & Related papers (2022-07-14T20:18:58Z) - The Met Dataset: Instance-level Recognition for Artworks [19.43143591288768]
This work introduces a dataset for large-scale instance-level recognition in the domain of artworks.
We rely on the open access collection of The Met museum to form a large training set of about 224k classes.
arXiv Detail & Related papers (2022-02-03T18:13:30Z) - A Universal Representation Transformer Layer for Few-Shot Image
Classification [43.31379752656756]
Few-shot classification aims to recognize unseen classes when presented with only a small number of samples.
We consider the problem of multi-domain few-shot image classification, where unseen classes and examples come from diverse data sources.
Here, we propose a Universal Representation Transformer layer, that meta-learns to leverage universal features for few-shot classification.
arXiv Detail & Related papers (2020-06-21T03:08:00Z) - Unifying Specialist Image Embedding into Universal Image Embedding [84.0039266370785]
It is desirable to have a universal deep embedding model applicable to various domains of images.
We propose to distill the knowledge in multiple specialists into a universal embedding to solve this problem.
arXiv Detail & Related papers (2020-03-08T02:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.