An End-to-End Transformer Model for Crowd Localization
- URL: http://arxiv.org/abs/2202.13065v1
- Date: Sat, 26 Feb 2022 05:21:30 GMT
- Title: An End-to-End Transformer Model for Crowd Localization
- Authors: Dingkang Liang, Wei Xu, Xiang Bai
- Abstract summary: Crowd localization, predicting head positions, is a more practical and high-level task than simply counting.
Existing methods employ pseudo-bounding boxes or pre-designed localization maps, relying on complex post-processing to obtain the head positions.
We propose an elegant, end-to-end Crowd Localization TRansformer that solves the task in the regression-based paradigm.
- Score: 64.15335535775883
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Crowd localization, predicting head positions, is a more practical and
high-level task than simply counting. Existing methods employ pseudo-bounding
boxes or pre-designed localization maps, relying on complex post-processing to
obtain the head positions. In this paper, we propose an elegant, end-to-end
Crowd Localization TRansformer named CLTR that solves the task in the
regression-based paradigm. The proposed method views the crowd localization as
a direct set prediction problem, taking extracted features and trainable
embeddings as input of the transformer-decoder. To achieve good matching
results, we introduce a KMO-based Hungarian, which innovatively revisits the
label assignment from a context view instead of an independent instance view.
Extensive experiments conducted on five datasets in various data settings show
the effectiveness of our method. In particular, the proposed method achieves
the best localization performance on the NWPU-Crowd, UCF-QNRF, and ShanghaiTech
Part A datasets.
Related papers
- Dual-Personalizing Adapter for Federated Foundation Models [35.863585349109385]
We propose a new setting, termed test-time personalization, which concentrates on the targeted local task and extends to other tasks that exhibit test-time distribution shifts.
Specifically, a dual-personalizing adapter architecture (FedDPA) is proposed, comprising a global adapter and a local adapter for addressing test-time distribution shifts and personalization.
The effectiveness of the proposed method has been evaluated on benchmark datasets across different NLP tasks.
arXiv Detail & Related papers (2024-03-28T08:19:33Z) - Accelerating Federated Learning by Selecting Beneficial Herd of Local Gradients [40.84399531998246]
Federated Learning (FL) is a distributed machine learning framework in communication network systems.
Non-Independent and Identically Distributed (Non-IID) data negatively affect the convergence efficiency of the global model.
We propose the BHerd strategy which selects a beneficial herd of local gradients to accelerate the convergence of the FL model.
arXiv Detail & Related papers (2024-03-25T09:16:59Z) - Learning Saliency From Fixations [0.9208007322096533]
We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps.
Our approach, named Saliency TRansformer (SalTR), achieves metric scores on par with state-of-the-art approaches on the Salicon and MIT300 benchmarks.
arXiv Detail & Related papers (2023-11-23T16:04:41Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - FedAvg with Fine Tuning: Local Updates Lead to Representation Learning [54.65133770989836]
Federated Averaging (FedAvg) algorithm consists of alternating between a few local gradient updates at client nodes, followed by a model averaging update at the server.
We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks.
We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.
arXiv Detail & Related papers (2022-05-27T00:55:24Z) - Masked Transformer for Neighhourhood-aware Click-Through Rate Prediction [74.52904110197004]
We propose Neighbor-Interaction based CTR prediction, which put this task into a Heterogeneous Information Network (HIN) setting.
In order to enhance the representation of the local neighbourhood, we consider four types of topological interaction among the nodes.
We conduct comprehensive experiments on two real world datasets and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly.
arXiv Detail & Related papers (2022-01-25T12:44:23Z) - On Second-order Optimization Methods for Federated Learning [59.787198516188425]
We evaluate the performance of several second-order distributed methods with local steps in the federated learning setting.
We propose a novel variant that uses second-order local information for updates and a global line search to counteract the resulting local specificity.
arXiv Detail & Related papers (2021-09-06T12:04:08Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.