PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person
Search
- URL: http://arxiv.org/abs/2210.03433v1
- Date: Fri, 7 Oct 2022 10:04:12 GMT
- Title: PS-ARM: An End-to-End Attention-aware Relation Mixer Network for Person
Search
- Authors: Mustansar Fiaz, Hisham Cholakkal, Sanath Narayan, Rao Muhammad Anwer,
and Fahad Shahbaz Khan
- Abstract summary: We propose a novel attention-aware relation mixer (ARM) for module person search.
Our ARM module is native and does not rely on fine-grained supervision or topological assumptions.
Our PS-ARM achieves state-of-the-art performance on both datasets.
- Score: 56.02761592710612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Person search is a challenging problem with various real-world applications,
that aims at joint person detection and re-identification of a query person
from uncropped gallery images. Although, the previous study focuses on rich
feature information learning, it is still hard to retrieve the query person due
to the occurrence of appearance deformations and background distractors. In
this paper, we propose a novel attention-aware relation mixer (ARM) module for
person search, which exploits the global relation between different local
regions within RoI of a person and make it robust against various appearance
deformations and occlusion. The proposed ARM is composed of a relation mixer
block and a spatio-channel attention layer. The relation mixer block introduces
a spatially attended spatial mixing and a channel-wise attended channel mixing
for effectively capturing discriminative relation features within an RoI. These
discriminative relation features are further enriched by introducing a
spatio-channel attention where the foreground and background discriminability
is empowered in a joint spatio-channel space. Our ARM module is generic and it
does not rely on fine-grained supervision or topological assumptions, hence
being easily integrated into any Faster R-CNN based person search methods.
Comprehensive experiments are performed on two challenging benchmark datasets:
CUHKSYSU and PRW. Our PS-ARM achieves state-of-the-art performance on both
datasets. On the challenging PRW dataset, our PS-ARM achieves an absolute gain
of 5 in the mAP score over SeqNet, while operating at a comparable speed.
Related papers
- Robust Collaborative Perception without External Localization and Clock Devices [52.32342059286222]
A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception.
Traditional methods depend on external devices to provide localization and clock signals.
We propose a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents.
arXiv Detail & Related papers (2024-05-05T15:20:36Z) - Transferring Modality-Aware Pedestrian Attentive Learning for
Visible-Infrared Person Re-identification [43.05147831905626]
We propose a novel Transferring Modality-Aware Pedestrian Attentive Learning (TMPA) model.
TMPA focuses on the pedestrian regions to efficiently compensate for missing modality-specific features.
experiments conducted on the benchmark SYSU-MM01 and RegDB datasets demonstrated the effectiveness of our proposed TMPA model.
arXiv Detail & Related papers (2023-12-12T07:15:17Z) - Learning Cross-modality Information Bottleneck Representation for
Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance.
Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities.
We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z) - FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection [4.534713782093219]
A novel end-to-end transformer-based framework (FGAHOI) is proposed to alleviate the above problems.
FGAHOI comprises three dedicated components namely, multi-scale sampling (MSS), hierarchical spatial-aware merging (HSAM) and task-aware merging mechanism (TAM)
arXiv Detail & Related papers (2023-01-08T03:53:50Z) - Soft Hierarchical Graph Recurrent Networks for Many-Agent Partially
Observable Environments [9.067091068256747]
We propose a novel network structure called hierarchical graph recurrent network(HGRN) for multi-agent cooperation under partial observability.
Based on the above technologies, we proposed a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant named SAC-HRGN.
arXiv Detail & Related papers (2021-09-05T09:51:25Z) - Region attention and graph embedding network for occlusion objective
class-based micro-expression recognition [26.5638344747854]
Micro-expression recognition (textbfMER) has attracted lots of researchers' attention in a decade.
This paper deeply investigates an interesting but unexplored challenging issue in MER, ie, occlusion MER.
A underlineRegion-inspired underlineRelation underlineReasoning underlineNetwork (textbfRRRN) is proposed to model relations between various facial regions.
arXiv Detail & Related papers (2021-07-13T08:04:03Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z) - DIRV: Dense Interaction Region Voting for End-to-End Human-Object
Interaction Detection [53.40028068801092]
We propose a novel one-stage HOI detection approach based on a new concept called interaction region for the HOI problem.
Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair.
In order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy.
arXiv Detail & Related papers (2020-10-02T13:57:58Z) - Landmark Guidance Independent Spatio-channel Attention and Complementary
Context Information based Facial Expression Recognition [5.076419064097734]
Modern facial expression recognition (FER) architectures rely on external sources like landmark detectors for defining attention.
In this work, an end-to-end architecture for FER is proposed that obtains both local and global attention per channel per spatial location.
robustness and superior performance of the proposed model is demonstrated on both in-lab and in-the-wild datasets.
arXiv Detail & Related papers (2020-07-20T17:33:32Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.