Image-text Retrieval: A Survey on Recent Research and Development
- URL: http://arxiv.org/abs/2203.14713v1
- Date: Mon, 28 Mar 2022 13:00:01 GMT
- Title: Image-text Retrieval: A Survey on Recent Research and Development
- Authors: Min Cao, Shiping Li, Juntao Li, Liqiang Nie, Min Zhang
- Abstract summary: Cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application.
This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives.
- Score: 58.060687870247996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the past few years, cross-modal image-text retrieval (ITR) has experienced
increased interest in the research community due to its excellent research
value and broad real-world application. It is designed for the scenarios where
the queries are from one modality and the retrieval galleries from another
modality. This paper presents a comprehensive and up-to-date survey on the ITR
approaches from four perspectives. By dissecting an ITR system into two
processes: feature extraction and feature alignment, we summarize the recent
advance of the ITR approaches from these two perspectives. On top of this, the
efficiency-focused study on the ITR system is introduced as the third
perspective. To keep pace with the times, we also provide a pioneering overview
of the cross-modal pre-training ITR approaches as the fourth perspective.
Finally, we outline the common benchmark datasets and valuation metric for ITR,
and conduct the accuracy comparison among the representative ITR approaches.
Some critical yet less studied issues are discussed at the end of the paper.
Related papers
- A Survey on Retrieval-Augmented Text Generation for Large Language Models [1.4579344926652844]
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements.
This paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation.
It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies.
arXiv Detail & Related papers (2024-04-17T01:27:42Z) - A Survey on Interpretable Cross-modal Reasoning [64.37362731950843]
Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
This survey delves into the realm of interpretable cross-modal reasoning (I-CMR)
This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
arXiv Detail & Related papers (2023-09-05T05:06:48Z) - A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers [76.51245425667845]
Relation extraction (RE) involves identifying the relations between entities from underlying content.
Deep neural networks have dominated the field of RE and made noticeable progress.
This survey is expected to facilitate researchers' collaborative efforts to address the challenges of real-world RE systems.
arXiv Detail & Related papers (2023-06-03T08:39:25Z) - Reviewer assignment problem: A scoping review [0.0]
The quality of peer review depends on the ability to recruit adequate reviewers for submitted papers.
Finding such reviewers is an increasingly difficult task due to several factors.
Solutions for automated association of papers with "well matching" reviewers have been the subject of research for thirty years now.
arXiv Detail & Related papers (2023-05-13T10:13:43Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Optimizing Two-way Partial AUC with an End-to-end Framework [154.47590401735323]
Area Under the ROC Curve (AUC) is a crucial metric for machine learning.
Recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics.
We present the first trial in this paper to optimize this new metric.
arXiv Detail & Related papers (2022-06-23T12:21:30Z) - A Survey on Temporal Sentence Grounding in Videos [69.13365006222251]
Temporal sentence grounding in videos(TSGV) aims to localize one target segment from an untrimmed video with respect to a given sentence query.
To the best of our knowledge, this is the first systematic survey on temporal sentence grounding.
arXiv Detail & Related papers (2021-09-16T15:01:46Z) - Cross-Domain Recommendation: Challenges, Progress, and Prospects [21.60393384976869]
Cross-domain recommendation (CDR) has been proposed to leverage the relatively richer information from a richer domain to improve the recommendation performance in a sparser domain.
In this paper, we provide a comprehensive review of existing CDR approaches, including challenges, research progress, and future directions.
arXiv Detail & Related papers (2021-03-02T12:58:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.