Enhancing Landmark Detection in Cluttered Real-World Scenarios with
Vision Transformers
- URL: http://arxiv.org/abs/2308.13671v1
- Date: Fri, 25 Aug 2023 21:01:01 GMT
- Title: Enhancing Landmark Detection in Cluttered Real-World Scenarios with
Vision Transformers
- Authors: Mohammad Javad Rajabi, Morteza Mirzai, Ahmad Nickabadi
- Abstract summary: This research contributes to the advancement of landmark detection in visual place recognition.
It shows the potential of leveraging vision transformers to overcome challenges posed by cluttered real-world scenarios.
- Score: 2.900522306460408
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual place recognition tasks often encounter significant challenges in
landmark detection due to the presence of irrelevant objects such as humans,
cars, and trees, despite the remarkable progress achieved by previous models,
especially in the context of transformers. To address this issue, we propose a
novel method that effectively leverages the strengths of vision transformers.
By employing a meticulous selection process, our approach identifies and
isolates specific patches within the image that correspond to occluding
objects. To evaluate the efficacy of our method, we created augmented datasets
and conducted comprehensive testing. The results demonstrate the superior
accuracy achieved by our proposed approach. This research contributes to the
advancement of landmark detection in visual place recognition and shows the
potential of leveraging vision transformers to overcome challenges posed by
cluttered real-world scenarios.
Related papers
- LEAP:D - A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detection [2.1233286062376497]
We introduce an innovative vision-language approach using learnable prompts.
This shift from conventional manual prompts aims to reduce domain-specific knowledge interference.
We streamline the training process with a one-step approach, updating the learnable prompt concurrently with model training.
arXiv Detail & Related papers (2024-11-14T04:39:10Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - DeepFidelity: Perceptual Forgery Fidelity Assessment for Deepfake
Detection [67.3143177137102]
Deepfake detection refers to detecting artificially generated or edited faces in images or videos.
We propose a novel Deepfake detection framework named DeepFidelity to adaptively distinguish real and fake faces.
arXiv Detail & Related papers (2023-12-07T07:19:45Z) - Improved TokenPose with Sparsity [0.0]
We introduce sparsity in both keypoint token attention and visual token attention to improve human pose estimation.
Experimental results on the MPII dataset demonstrate that our model has a higher level of accuracy and proved the feasibility of the method.
arXiv Detail & Related papers (2023-11-16T08:12:34Z) - Fusing Pseudo Labels with Weak Supervision for Dynamic Traffic Scenarios [0.0]
We introduce a weakly-supervised label unification pipeline that amalgamates pseudo labels from object detection models trained on heterogeneous datasets.
Our pipeline engenders a unified label space through the amalgamation of labels from disparate datasets, rectifying bias and enhancing generalization.
We retrain a solitary object detection model using the merged label space, culminating in a resilient model proficient in dynamic traffic scenarios.
arXiv Detail & Related papers (2023-08-30T11:33:07Z) - Learning Explicit Object-Centric Representations with Vision
Transformers [81.38804205212425]
We build on the self-supervision task of masked autoencoding and explore its effectiveness for learning object-centric representations with transformers.
We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.
arXiv Detail & Related papers (2022-10-25T16:39:49Z) - Vision Transformers for Action Recognition: A Survey [41.69370782177517]
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have proven the efficacy of transformers beyond the image domain to solve numerous video-related tasks.
Human action recognition is receiving special attention from the research community due to its widespread applications.
arXiv Detail & Related papers (2022-09-13T02:57:05Z) - Towards Accurate Facial Landmark Detection via Cascaded Transformers [14.74021483826222]
We propose an accurate facial landmark detector based on cascaded transformers.
With self-attention in transformers, our model can inherently exploit the structured relationships between landmarks.
During cascaded refinement, our model is able to extract the most relevant image features around the target landmark for coordinate prediction.
arXiv Detail & Related papers (2022-08-23T08:42:13Z) - Visualizing and Understanding Patch Interactions in Vision Transformer [96.70401478061076]
Vision Transformer (ViT) has become a leading tool in various computer vision tasks.
We propose a novel explainable visualization approach to analyze and interpret the crucial attention interactions among patches for vision transformer.
arXiv Detail & Related papers (2022-03-11T13:48:11Z) - Detect and Locate: A Face Anti-Manipulation Approach with Semantic and
Noise-level Supervision [67.73180660609844]
We propose a conceptually simple but effective method to efficiently detect forged faces in an image.
The proposed scheme relies on a segmentation map that delivers meaningful high-level semantic information clues about the image.
The proposed model achieves state-of-the-art detection accuracy and remarkable localization performance.
arXiv Detail & Related papers (2021-07-13T02:59:31Z) - Pretrained equivariant features improve unsupervised landmark discovery [69.02115180674885]
We formulate a two-step unsupervised approach that overcomes this challenge by first learning powerful pixel-based features.
Our method produces state-of-the-art results in several challenging landmark detection datasets.
arXiv Detail & Related papers (2021-04-07T05:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.