Bounding Boxes Are All We Need: Street View Image Classification via
Context Encoding of Detected Buildings
- URL: http://arxiv.org/abs/2010.01305v2
- Date: Mon, 12 Oct 2020 05:52:29 GMT
- Title: Bounding Boxes Are All We Need: Street View Image Classification via
Context Encoding of Detected Buildings
- Authors: Kun Zhao, Yongkun Liu, Siyuan Hao, Shaoxing Lu, Hongbin Liu, Lijian
Zhou
- Abstract summary: "Detector-Encoder-Classifier" framework is proposed.
"BEAUTY" dataset can be used not only for street view image classification, but also for multi-class building detection.
- Score: 7.1235778791928634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Street view images classification aiming at urban land use analysis is
difficult because the class labels (e.g., commercial area), are concepts with
higher abstract level compared to the ones of general visual tasks (e.g.,
persons and cars). Therefore, classification models using only visual features
often fail to achieve satisfactory performance. In this paper, a novel approach
based on a "Detector-Encoder-Classifier" framework is proposed. Instead of
using visual features of the whole image directly as common image-level models
based on convolutional neural networks (CNNs) do, the proposed framework
firstly obtains the bounding boxes of buildings in street view images from a
detector. Their contextual information such as the co-occurrence patterns of
building classes and their layout are then encoded into metadata by the
proposed algorithm "CODING" (Context encOding of Detected buildINGs). Finally,
these bounding box metadata are classified by a recurrent neural network (RNN).
In addition, we made a dual-labeled dataset named "BEAUTY" (Building dEtection
And Urban funcTional-zone portraYing) of 19,070 street view images and 38,857
buildings based on the existing BIC GSV [1]. The dataset can be used not only
for street view image classification, but also for multi-class building
detection. Experiments on "BEAUTY" show that the proposed approach achieves a
12.65% performance improvement on macro-precision and 12% on macro-recall over
image-level CNN based models. Our code and dataset are available at
https://github.com/kyle-one/Context-Encoding-of-Detected-Buildings/
Related papers
- Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Segmentation of Roads in Satellite Images using specially modified U-Net
CNNs [0.0]
The aim of this paper is to build an image classifier for satellite images of urban scenes that identifies the portions of the images in which a road is located.
Unlike conventional computer vision algorithms, convolutional neural networks (CNNs) provide accurate and reliable results on this task.
arXiv Detail & Related papers (2021-09-29T19:08:32Z) - Graph Attention Layer Evolves Semantic Segmentation for Road Pothole
Detection: A Benchmark and Algorithms [34.80667966432126]
Existing road pothole detection approaches can be classified as computer vision-based or machine learning-based.
The latter approaches generally address road pothole detection using convolutional neural networks (CNNs) in an end-to-end manner.
We propose a novel CNN layer, referred to as graph attention layer (GAL), which can be easily deployed in any existing CNN to optimize image feature representations for semantic segmentation.
arXiv Detail & Related papers (2021-09-06T19:44:50Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - DetCo: Unsupervised Contrastive Learning for Object Detection [64.22416613061888]
Unsupervised contrastive learning achieves great success in learning image representations with CNN.
We present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches.
DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule.
arXiv Detail & Related papers (2021-02-09T12:47:20Z) - Convolutional Neural Networks from Image Markers [62.997667081978825]
Feature Learning from Image Markers (FLIM) was recently proposed to estimate convolutional filters, with no backpropagation, from strokes drawn by a user on very few images.
This paper extends FLIM for fully connected layers and demonstrates it on different image classification problems.
The results show that FLIM-based convolutional neural networks can outperform the same architecture trained from scratch by backpropagation.
arXiv Detail & Related papers (2020-12-15T22:58:23Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z) - Automatic Signboard Detection and Localization in Densely Populated
Developing Cities [0.0]
Signboard detection in natural scene images is the foremost task for error-free information retrieval.
We present a novel object detection approach that can detect signboards automatically and is suitable for such cities.
Our proposed method can detect signboards accurately (even if the images contain multiple signboards with diverse shapes and colours in a noisy background) achieving 0.90 mAP (mean average precision)
arXiv Detail & Related papers (2020-03-04T08:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.