Connecting Images through Time and Sources: Introducing Low-data,
Heterogeneous Instance Retrieval
- URL: http://arxiv.org/abs/2103.10729v1
- Date: Fri, 19 Mar 2021 10:54:51 GMT
- Title: Connecting Images through Time and Sources: Introducing Low-data,
Heterogeneous Instance Retrieval
- Authors: Dimitri Gominski and Val\'erie Gouet-Brunet and Liming Chen
- Abstract summary: We show that it is not trivial to pick features responding well to a panel of variations and semantic content.
Introducing a new enhanced version of the Alegoria benchmark, we compare descriptors using the detailed annotations.
- Score: 3.6526118822907594
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With impressive results in applications relying on feature learning, deep
learning has also blurred the line between algorithm and data. Pick a training
dataset, pick a backbone network for feature extraction, and voil\`a ; this
usually works for a variety of use cases. But the underlying hypothesis that
there exists a training dataset matching the use case is not always met.
Moreover, the demand for interconnections regardless of the variations of the
content calls for increasing generalization and robustness in features.
An interesting application characterized by these problematics is the
connection of historical and cultural databases of images. Through the
seemingly simple task of instance retrieval, we propose to show that it is not
trivial to pick features responding well to a panel of variations and semantic
content. Introducing a new enhanced version of the Alegoria benchmark, we
compare descriptors using the detailed annotations. We further give insights
about the core problems in instance retrieval, testing four state-of-the-art
additional techniques to increase performance.
Related papers
- Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - Retrieval-Enhanced Visual Prompt Learning for Few-shot Classification [9.843214426749764]
We propose retrieval-enhanced visual prompt learning (RePrompt) to cache and reuse knowledge of downstream tasks.
During inference, our enhanced model can reference similar samples brought by retrieval to make more accurate predictions.
RePrompt attains state-of-the-art performance on a wide range of vision datasets.
arXiv Detail & Related papers (2023-06-04T03:06:37Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - Can I see an Example? Active Learning the Long Tail of Attributes and
Relations [64.50739983632006]
We introduce a novel incremental active learning framework that asks for attributes and relations in visual scenes.
While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories.
Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
arXiv Detail & Related papers (2022-03-11T19:28:19Z) - A Contrastive Distillation Approach for Incremental Semantic
Segmentation in Aerial Images [15.75291664088815]
A major issue concerning current deep neural architectures is known as catastrophic forgetting.
We propose a contrastive regularization, where any given input is compared with its augmented version.
We show the effectiveness of our solution on the Potsdam dataset, outperforming the incremental baseline in every test.
arXiv Detail & Related papers (2021-12-07T16:44:45Z) - SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval [15.522964295287425]
We propose a novel loss function that is based on self-labeling of the unknown classes.
We tested our approach on several real-world cross-modal retrieval problems, including text-based video retrieval, sketch-based image retrieval, and image-text retrieval.
arXiv Detail & Related papers (2021-11-10T17:17:09Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Cross-Modal Retrieval Augmentation for Multi-Modal Classification [61.5253261560224]
We explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering.
First, we train a novel alignment model for embedding images and captions in the same space, which achieves substantial improvement on image-caption retrieval.
Second, we show that retrieval-augmented multi-modal transformers using the trained alignment model improve results on VQA over strong baselines.
arXiv Detail & Related papers (2021-04-16T13:27:45Z) - Part2Whole: Iteratively Enrich Detail for Cross-Modal Retrieval with
Partial Query [25.398090300086302]
We propose an interactive retrieval framework called Part2Whole to tackle this problem.
An Interactive Retrieval Agent is trained to build an optimal policy to refine the initial query.
We present a weakly-supervised reinforcement learning method that needs no human-annotated data other than the text-image dataset.
arXiv Detail & Related papers (2021-03-02T11:27:05Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.