Region Embedding with Intra and Inter-View Contrastive Learning
- URL: http://arxiv.org/abs/2211.08975v1
- Date: Tue, 15 Nov 2022 10:57:20 GMT
- Title: Region Embedding with Intra and Inter-View Contrastive Learning
- Authors: Liang Zhang, Cheng Long, and Gao Cong
- Abstract summary: Unsupervised region representation learning aims to extract dense and effective features from unlabeled urban data.
Motivated by the success of contrastive learning for representation learning, we propose to leverage it for multi-view region representation learning.
We design the intra-view contrastive learning module which helps to learn distinguished region embeddings and the inter-view contrastive learning module which serves as a soft co-regularizer.
- Score: 29.141194278469417
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised region representation learning aims to extract dense and
effective features from unlabeled urban data. While some efforts have been made
for solving this problem based on multiple views, existing methods are still
insufficient in extracting representations in a view and/or incorporating
representations from different views. Motivated by the success of contrastive
learning for representation learning, we propose to leverage it for multi-view
region representation learning and design a model called ReMVC (Region
Embedding with Multi-View Contrastive Learning) by following two guidelines: i)
comparing a region with others within each view for effective representation
extraction and ii) comparing a region with itself across different views for
cross-view information sharing. We design the intra-view contrastive learning
module which helps to learn distinguished region embeddings and the inter-view
contrastive learning module which serves as a soft co-regularizer to constrain
the embedding parameters and transfer knowledge across multi-views. We exploit
the learned region embeddings in two downstream tasks named land usage
clustering and region popularity prediction. Extensive experiments demonstrate
that our model achieves impressive improvements compared with seven
state-of-the-art baseline methods, and the margins are over 30% in the land
usage clustering task.
Related papers
- Visual In-Context Learning for Large Vision-Language Models [62.5507897575317]
In Large Visual Language Models (LVLMs) the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities.
We introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition.
Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations.
arXiv Detail & Related papers (2024-02-18T12:43:38Z) - Urban Region Embedding via Multi-View Contrastive Prediction [22.164358462563996]
We form a new pipeline to learn consistent representations across varying views.
Our model outperforms state-of-the-art baseline methods significantly in urban region representation learning.
arXiv Detail & Related papers (2023-12-15T10:53:09Z) - Attentive Graph Enhanced Region Representation Learning [7.4106801792345705]
Representing urban regions accurately and comprehensively is essential for various urban planning and analysis tasks.
We propose the Attentive Graph Enhanced Region Representation Learning (ATGRL) model, which aims to capture comprehensive dependencies from multiple graphs and learn rich semantic representations of urban regions.
arXiv Detail & Related papers (2023-07-06T16:38:43Z) - Minimum Class Confusion based Transfer for Land Cover Segmentation in
Rural and Urban Regions [0.0]
We present a semantic segmentation method that allows us to make land cover maps by using transfer learning methods.
We compare models trained in low-resolution images with insufficient data for the targeted region or zoom level.
Experiments showed that transfer learning improves segmentation performance 3.4% MIoU (Mean Intersection over Union) in rural regions and 12.9% MIoU in urban regions.
arXiv Detail & Related papers (2022-12-05T09:41:06Z) - Cross-view Graph Contrastive Representation Learning on Partially
Aligned Multi-view Data [52.491074276133325]
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.
We propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations.
Experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
arXiv Detail & Related papers (2022-11-08T09:19:32Z) - Point-Level Region Contrast for Object Detection Pre-Training [147.47349344401806]
We present point-level region contrast, a self-supervised pre-training approach for the task of object detection.
Our approach performs contrastive learning by directly sampling individual point pairs from different regions.
Compared to an aggregated representation per region, our approach is more robust to the change in input region quality.
arXiv Detail & Related papers (2022-02-09T18:56:41Z) - RegionCL: Can Simple Region Swapping Contribute to Contrastive Learning? [76.16156833138038]
We propose a simple yet effective pretext task called Region Contrastive Learning (RegionCL)
Specifically, given two different images, we randomly crop a region from each image with the same size and swap them to compose two new images together with the left regions.
RegionCL exploits those abundant pairs and helps the model distinguish the regions features from both canvas and paste views.
arXiv Detail & Related papers (2021-11-24T07:19:46Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Learning Neighborhood Representation from Multi-Modal Multi-Graph:
Image, Text, Mobility Graph and Beyond [20.014906526266795]
We propose a novel approach to integrate multi-modal geotagged inputs as either node or edge features of a multi-graph.
Specifically, we use street view images and POI features to characterize neighborhoods (nodes) and use human mobility to characterize the relationship between neighborhoods (directed edges)
The embedding we trained outperforms the ones using only unimodal data as regional inputs.
arXiv Detail & Related papers (2021-05-06T07:44:05Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.