Related papers: Image-text Retrieval: A Survey on Recent Research and Development

Image-text Retrieval: A Survey on Recent Research and Development

URL: http://arxiv.org/abs/2203.14713v1
Date: Mon, 28 Mar 2022 13:00:01 GMT
Title: Image-text Retrieval: A Survey on Recent Research and Development
Authors: Min Cao, Shiping Li, Juntao Li, Liqiang Nie, Min Zhang
Abstract summary: Cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application. This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives.
Score: 58.060687870247996
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the past few years, cross-modal image-text retrieval (ITR) has experienced increased interest in the research community due to its excellent research value and broad real-world application. It is designed for the scenarios where the queries are from one modality and the retrieval galleries from another modality. This paper presents a comprehensive and up-to-date survey on the ITR approaches from four perspectives. By dissecting an ITR system into two processes: feature extraction and feature alignment, we summarize the recent advance of the ITR approaches from these two perspectives. On top of this, the efficiency-focused study on the ITR system is introduced as the third perspective. To keep pace with the times, we also provide a pioneering overview of the cross-modal pre-training ITR approaches as the fourth perspective. Finally, we outline the common benchmark datasets and valuation metric for ITR, and conduct the accuracy comparison among the representative ITR approaches. Some critical yet less studied issues are discussed at the end of the paper.

Related papers

RWESummary: A Framework and Test for Choosing Large Language Models to Summarize Real-World Evidence (RWE) Studies [0.0]
Large Language Models (LLMs) have been extensively evaluated for general summarization tasks as well as medical research assistance.<n>We introduce RWESummary, a proposed addition to the MedHELM framework to enable benchmarking of LLMs for this task.<n>RWESummary includes one scenario and three evaluations covering major types of errors observed in summarization of medical research studies.
arXiv Detail & Related papers (2025-06-23T16:28:03Z)
Social Good or Scientific Curiosity? Uncovering the Research Framing Behind NLP Artefacts [10.225194259153426]
Clarifying the research framing of NLP artefacts is crucial to aligning research with practical applications.<n>Recent studies manually analyzed NLP research across domains, showing that few papers explicitly identify key stakeholders, intended uses, or appropriate contexts.<n>We develop a three-component system that infers research framings by first extracting key elements (means, ends, stakeholders), then linking them through interpretable rules and contextual reasoning.
arXiv Detail & Related papers (2025-05-24T12:46:26Z)
MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation [56.87891213797931]
We present MTR-Bench for Large Language Models' Multi-Turn Reasoning evaluation.<n>Comprising 4 classes, 40 tasks, and 3600 instances, MTR-Bench covers diverse reasoning capabilities.<n>MTR-Bench features fully-automated framework spanning both dataset constructions and model evaluations.
arXiv Detail & Related papers (2025-05-21T17:59:12Z)
A Comprehensive Survey on Composed Image Retrieval [54.54527281731775]
Composed Image Retrieval (CIR) is an emerging yet challenging task that allows users to search for target images using a multimodal query. There is currently no comprehensive review of CIR to provide a timely overview of this field. We synthesize insights from over 120 publications in top conferences and journals, including ACM TOIS, SIGIR, and CVPR.
arXiv Detail & Related papers (2025-02-19T01:37:24Z)
Data Augmentation for Sequential Recommendation: A Survey [9.913317029557588]
sequential recommendation (SR) has received much attention due to its well-consistency with real-world situations. We provide a comprehensive review of data augmentation (DA) methods for SR.
arXiv Detail & Related papers (2024-09-20T14:39:42Z)
A Survey on Interpretable Cross-modal Reasoning [64.37362731950843]
Cross-modal reasoning (CMR) has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR) This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR.
arXiv Detail & Related papers (2023-09-05T05:06:48Z)
A Comprehensive Survey on Relation Extraction: Recent Advances and New Frontiers [76.51245425667845]
Relation extraction (RE) involves identifying the relations between entities from underlying content. Deep neural networks have dominated the field of RE and made noticeable progress. This survey is expected to facilitate researchers' collaborative efforts to address the challenges of real-world RE systems.
arXiv Detail & Related papers (2023-06-03T08:39:25Z)
Reviewer assignment problem: A scoping review [0.0]
The quality of peer review depends on the ability to recruit adequate reviewers for submitted papers. Finding such reviewers is an increasingly difficult task due to several factors. Solutions for automated association of papers with "well matching" reviewers have been the subject of research for thirty years now.
arXiv Detail & Related papers (2023-05-13T10:13:43Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)
Optimizing Two-way Partial AUC with an End-to-end Framework [154.47590401735323]
Area Under the ROC Curve (AUC) is a crucial metric for machine learning. Recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics. We present the first trial in this paper to optimize this new metric.
arXiv Detail & Related papers (2022-06-23T12:21:30Z)
A Survey on Temporal Sentence Grounding in Videos [69.13365006222251]
Temporal sentence grounding in videos(TSGV) aims to localize one target segment from an untrimmed video with respect to a given sentence query. To the best of our knowledge, this is the first systematic survey on temporal sentence grounding.
arXiv Detail & Related papers (2021-09-16T15:01:46Z)
Cross-Domain Recommendation: Challenges, Progress, and Prospects [21.60393384976869]
Cross-domain recommendation (CDR) has been proposed to leverage the relatively richer information from a richer domain to improve the recommendation performance in a sparser domain. In this paper, we provide a comprehensive review of existing CDR approaches, including challenges, research progress, and future directions.
arXiv Detail & Related papers (2021-03-02T12:58:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.