Related papers: Connecting Images through Time and Sources: Introducing Low-data, Heterogeneous Instance Retrieval

Connecting Images through Time and Sources: Introducing Low-data, Heterogeneous Instance Retrieval

URL: http://arxiv.org/abs/2103.10729v1
Date: Fri, 19 Mar 2021 10:54:51 GMT
Title: Connecting Images through Time and Sources: Introducing Low-data, Heterogeneous Instance Retrieval
Authors: Dimitri Gominski and Val\'erie Gouet-Brunet and Liming Chen
Abstract summary: We show that it is not trivial to pick features responding well to a panel of variations and semantic content. Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations.
Score: 3.6526118822907594
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With impressive results in applications relying on feature learning, deep learning has also blurred the line between algorithm and data. Pick a training dataset, pick a backbone network for feature extraction, and voil\`a ; this usually works for a variety of use cases. But the underlying hypothesis that there exists a training dataset matching the use case is not always met. Moreover, the demand for interconnections regardless of the variations of the content calls for increasing generalization and robustness in features. An interesting application characterized by these problematics is the connection of historical and cultural databases of images. Through the seemingly simple task of instance retrieval, we propose to show that it is not trivial to pick features responding well to a panel of variations and semantic content. Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations. We further give insights about the core problems in instance retrieval, testing four state-of-the-art additional techniques to increase performance.

Related papers

Image Retrieval Methods in the Dissimilarity Space [10.00342846297521]
We argue that the feature dissimilarity space is more suitable for similarity matching. We also propose a dichotomy transformation to project query and reference embeddings into a single embedding in the dissimilarity space. As opposed to comparing the distance between queries and reference embeddings, we show the benefits of classifying the single dissimilarity space embedding.
arXiv Detail & Related papers (2024-12-11T18:39:32Z)
Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling. We generate supplemental training pairs by altering the most important part of a search context. We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z)
Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification [9.843214426749764]
We propose retrieval-enhanced visual prompt learning (RePrompt) to cache and reuse knowledge of downstream tasks. During inference, our enhanced model can reference similar samples brought by retrieval to make more accurate predictions. RePrompt attains state-of-the-art performance on a wide range of vision datasets.
arXiv Detail & Related papers (2023-06-04T03:06:37Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z)
Can I see an Example? Active Learning the Long Tail of Attributes and Relations [64.50739983632006]
We introduce a novel incremental active learning framework that asks for attributes and relations in visual scenes. While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories. Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
arXiv Detail & Related papers (2022-03-11T19:28:19Z)
A Contrastive Distillation Approach for Incremental Semantic Segmentation in Aerial Images [15.75291664088815]
A major issue concerning current deep neural architectures is known as catastrophic forgetting. We propose a contrastive regularization, where any given input is compared with its augmented version. We show the effectiveness of our solution on the Potsdam dataset, outperforming the incremental baseline in every test.
arXiv Detail & Related papers (2021-12-07T16:44:45Z)
SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval [15.522964295287425]
We propose a novel loss function that is based on self-labeling of the unknown classes. We tested our approach on several real-world cross-modal retrieval problems, including text-based video retrieval, sketch-based image retrieval, and image-text retrieval.
arXiv Detail & Related papers (2021-11-10T17:17:09Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Cross-Modal Retrieval Augmentation for Multi-Modal Classification [61.5253261560224]
We explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering. First, we train a novel alignment model for embedding images and captions in the same space, which achieves substantial improvement on image-caption retrieval. Second, we show that retrieval-augmented multi-modal transformers using the trained alignment model improve results on VQA over strong baselines.
arXiv Detail & Related papers (2021-04-16T13:27:45Z)
Part2Whole: Iteratively Enrich Detail for Cross-Modal Retrieval with Partial Query [25.398090300086302]
We propose an interactive retrieval framework called Part2Whole to tackle this problem. An Interactive Retrieval Agent is trained to build an optimal policy to refine the initial query. We present a weakly-supervised reinforcement learning method that needs no human-annotated data other than the text-image dataset.
arXiv Detail & Related papers (2021-03-02T11:27:05Z)
Learning to Match Jobs with Resumes from Sparse Interaction Data using Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms. We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching. Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.