A Convolutional Baseline for Person Re-Identification Using Vision and
Language Descriptions
- URL: http://arxiv.org/abs/2003.00808v1
- Date: Thu, 20 Feb 2020 10:12:02 GMT
- Title: A Convolutional Baseline for Person Re-Identification Using Vision and
Language Descriptions
- Authors: Ammarah Farooq, Muhammad Awais, Fei Yan, Josef Kittler, Ali Akbari,
and Syed Safwan Khalid
- Abstract summary: In real-world surveillance scenarios, frequently no visual information will be available about the queried person.
A two stream deep convolutional neural network framework supervised by cross entropy loss is presented.
The learnt visual representations are more robust and perform 22% better during retrieval as compared to a single modality system.
- Score: 24.794592610444514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical person re-identification approaches assume that a person of
interest has appeared across different cameras and can be queried by one of the
existing images. However, in real-world surveillance scenarios, frequently no
visual information will be available about the queried person. In such
scenarios, a natural language description of the person by a witness will
provide the only source of information for retrieval. In this work, person
re-identification using both vision and language information is addressed under
all possible gallery and query scenarios. A two stream deep convolutional
neural network framework supervised by cross entropy loss is presented. The
weights connecting the second last layer to the last layer with class
probabilities, i.e., logits of softmax layer are shared in both networks.
Canonical Correlation Analysis is performed to enhance the correlation between
the two modalities in a joint latent embedding space. To investigate the
benefits of the proposed approach, a new testing protocol under a multi modal
ReID setting is proposed for the test split of the CUHK-PEDES and CUHK-SYSU
benchmarks. The experimental results verify the merits of the proposed system.
The learnt visual representations are more robust and perform 22\% better
during retrieval as compared to a single modality system. The retrieval with a
multi modal query greatly enhances the re-identification capability of the
system quantitatively as well as qualitatively.
Related papers
- A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning [9.786907179872815]
The potential of vision and language remains underexplored in face forgery detection.
There is a need for a methodology that converts face forgery detection to a Visual Question Answering (VQA) task.
We propose a multi-staged approach that diverges from the traditional binary decision paradigm to address this gap.
arXiv Detail & Related papers (2024-10-01T08:16:40Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - Zero-shot Visual Relation Detection via Composite Visual Cues from Large
Language Models [44.60439935450292]
We propose a novel method for zero-shot visual recognition: RECODE.
It decomposes each predicate category into subject, object, and spatial components.
Different visual cues enhance the discriminability of similar relation categories from different perspectives.
arXiv Detail & Related papers (2023-05-21T14:40:48Z) - Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part.
We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge.
Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Unsupervised Contrastive Hashing for Cross-Modal Retrieval in Remote
Sensing [1.6758573326215689]
Cross-modal text-image retrieval has attracted great attention in remote sensing.
We introduce a novel unsupervised cross-modal contrastive hashing (DUCH) method for text-image retrieval in RS.
Experimental results show that the proposed DUCH outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-04-19T07:25:25Z) - Global-Local Context Network for Person Search [125.51080862575326]
Person search aims to jointly localize and identify a query person from natural, uncropped images.
We exploit rich context information globally and locally surrounding the target person, which we refer to scene and group context, respectively.
We propose a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement.
arXiv Detail & Related papers (2021-12-05T07:38:53Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Gait Recognition using Multi-Scale Partial Representation Transformation
with Capsules [22.99694601595627]
We propose a novel deep network, learning to transfer multi-scale partial gait representations using capsules.
Our network first obtains multi-scale partial representations using a state-of-the-art deep partial feature extractor.
It then recurrently learns the correlations and co-occurrences of the patterns among the partial features in forward and backward directions.
arXiv Detail & Related papers (2020-10-18T19:47:38Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z) - FMT:Fusing Multi-task Convolutional Neural Network for Person Search [33.91664470686695]
We propose a fusing multi-task convolutional neural network(FMT-CNN) to tackle the correlation and heterogeneity of detection and re-identification.
Experiment results on CUHK-SYSU Person Search dataset show that the performance of our proposed method is superior to state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-01T05:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.