Conditioning Covert Geo-Location (CGL) Detection on Semantic Class
Information
- URL: http://arxiv.org/abs/2211.14750v1
- Date: Sun, 27 Nov 2022 07:21:59 GMT
- Title: Conditioning Covert Geo-Location (CGL) Detection on Semantic Class
Information
- Authors: Binoy Saha, Sukhendu Das
- Abstract summary: Task for identification of potential hideouts termed Covert Geo-Location (CCGL) detection was proposed by Saha et al.
No attempts were made to utilize semantic class information, which is crucial for obscured detection.
In this paper, we propose a multitask-learning-based approach to achieve 2 goals - i) extraction of features having semantic class information; ii) robust training of the common encoder, exploiting large standard annotated datasets as training set for the auxiliary task (semantic segmentation).
- Score: 5.660207256468971
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The primary goal of artificial intelligence is to mimic humans. Therefore, to
advance toward this goal, the AI community attempts to imitate qualities/skills
possessed by humans and imbibes them into machines with the help of
datasets/tasks. Earlier, many tasks which require knowledge about the objects
present in an image are satisfactorily solved by vision models. Recently, with
the aim to incorporate knowledge about non-object image regions (hideouts,
turns, and other obscured regions), a task for identification of potential
hideouts termed Covert Geo-Location (CGL) detection was proposed by Saha et al.
It involves identification of image regions which have the potential to either
cause an imminent threat or appear as target zones to be accessed for further
investigation to identify any occluded objects. Only certain occluding items
belonging to certain semantic classes can give rise to CGLs. This fact was
overlooked by Saha et al. and no attempts were made to utilize semantic class
information, which is crucial for CGL detection. In this paper, we propose a
multitask-learning-based approach to achieve 2 goals - i) extraction of
features having semantic class information; ii) robust training of the common
encoder, exploiting large standard annotated datasets as training set for the
auxiliary task (semantic segmentation). To explicitly incorporate class
information in the features extracted by the encoder, we have further employed
attention mechanism in a novel manner. We have also proposed a better
evaluation metric for CGL detection that gives more weightage to recognition
rather than precise localization. Experimental evaluations performed on the CGL
dataset, demonstrate a significant increase in performance of about 3% to 14%
mIoU and 3% to 16% DaR on split 1, and 1% mIoU and 1% to 2% DaR on split 2 over
SOTA, serving as a testimony to the superiority of our approach.
Related papers
- Adaptive Masking Enhances Visual Grounding [12.793586888511978]
We propose IMAGE, Interpretative MAsking with Gaussian radiation modEling, to enhance vocabulary grounding in low-shot learning scenarios.
We evaluate the efficacy of our approach on benchmark datasets, including COCO and ODinW, demonstrating its superior performance in zero-shot and few-shot tasks.
arXiv Detail & Related papers (2024-10-04T05:48:02Z) - Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness [44.15562068190958]
In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings.
State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable.
We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras.
arXiv Detail & Related papers (2024-07-07T17:17:52Z) - SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation [20.29438820908913]
Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations.
We introduce SCE-MAE, a framework that operates on the vanilla feature map instead of on expensive hypercolumns.
We demonstrate through experiments that SCE-MAE is highly effective and robust, outperforming existing SOTA methods by large margins.
arXiv Detail & Related papers (2024-05-28T16:14:10Z) - Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification [0.5572976467442564]
The work described in this paper uses both semantic information, obtained from object detection, and semantic segmentation techniques.
A novel approach that uses a semantic segmentation mask to provide Hu-moments-based segmentation categories' shape characterization, designated by Hu-Moments Features (SHMFs) is proposed.
A three-main-branch network, designated by GOS$2$F$2$App, that exploits deep-learning-based global features, object-based features, and semantic segmentation-based features is also proposed.
arXiv Detail & Related papers (2024-04-11T13:37:51Z) - Optimization Efficient Open-World Visual Region Recognition [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.
Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Goal-Oriented Gaze Estimation for Zero-Shot Learning [62.52340838817908]
We introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization.
We aim to predict the actual human gaze location to get the visual attention regions for recognizing a novel object guided by attribute description.
This work implies the promising benefits of collecting human gaze dataset and automatic gaze estimation algorithms on high-level computer vision tasks.
arXiv Detail & Related papers (2021-03-05T02:14:57Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Improving Point Cloud Semantic Segmentation by Learning 3D Object
Detection [102.62963605429508]
Point cloud semantic segmentation plays an essential role in autonomous driving.
Current 3D semantic segmentation networks focus on convolutional architectures that perform great for well represented classes.
We propose a novel Aware 3D Semantic Detection (DASS) framework that explicitly leverages localization features from an auxiliary 3D object detection task.
arXiv Detail & Related papers (2020-09-22T14:17:40Z) - Grounded Situation Recognition [56.18102368133022]
We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images.
GSR presents important technical challenges: identifying semantic saliency, categorizing and localizing a large and diverse set of entities.
We show initial findings on three exciting future directions enabled by our models: conditional querying, visual chaining, and grounded semantic aware image retrieval.
arXiv Detail & Related papers (2020-03-26T17:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.