Visual Distant Supervision for Scene Graph Generation
- URL: http://arxiv.org/abs/2103.15365v1
- Date: Mon, 29 Mar 2021 06:35:24 GMT
- Title: Visual Distant Supervision for Scene Graph Generation
- Authors: Yuan Yao, Ao Zhang, Xu Han, Mengdi Li, Cornelius Weber, Zhiyuan Liu,
Stefan Wermter, Maosong Sun
- Abstract summary: Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
- Score: 66.10579690929623
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graph generation aims to identify objects and their relations in
images, providing structured image representations that can facilitate numerous
applications in computer vision. However, scene graph models usually require
supervised learning on large quantities of labeled data with intensive human
annotation. In this work, we propose visual distant supervision, a novel
paradigm of visual relation learning, which can train scene graph models
without any human-labeled data. The intuition is that by aligning commonsense
knowledge bases and images, we can automatically create large-scale labeled
data to provide distant supervision for visual relation learning. To alleviate
the noise in distantly labeled data, we further propose a framework that
iteratively estimates the probabilistic relation labels and eliminates the
noisy ones. Comprehensive experimental results show that our distantly
supervised model outperforms strong weakly supervised and semi-supervised
baselines. By further incorporating human-labeled data in a semi-supervised
fashion, our model outperforms state-of-the-art fully supervised models by a
large margin (e.g., 8.6 micro- and 7.6 macro-recall@50 improvements for
predicate classification in Visual Genome evaluation). All the data and code
will be available to facilitate future research.
Related papers
- DisenSemi: Semi-supervised Graph Classification via Disentangled Representation Learning [36.85439684013268]
We propose a novel framework named DisenSemi, which learns disentangled representation for semi-supervised graph classification.
Specifically, a disentangled graph encoder is proposed to generate factor-wise graph representations for both supervised and unsupervised models.
We train two models via supervised objective and mutual information (MI)-based constraints respectively.
arXiv Detail & Related papers (2024-07-19T07:31:32Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Generated Graph Detection [27.591612297045817]
Graph generative models become increasingly effective for data distribution approximation and data augmentation.
We propose the first framework to investigate a set of sophisticated models and their performance in four classification scenarios.
Our solution can sustain for a decent while to curb generated graph misuses.
arXiv Detail & Related papers (2023-06-13T13:18:04Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - Biasing Like Human: A Cognitive Bias Framework for Scene Graph
Generation [20.435023745201878]
We propose a novel 3-paradigms framework that simulates how humans incorporate the label linguistic features as guidance of vision-based representations.
Our framework is model-agnostic to any scene graph model.
arXiv Detail & Related papers (2022-03-17T08:29:52Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Sub-graph Contrast for Scalable Self-Supervised Graph Representation
Learning [21.0019144298605]
Existing graph neural networks fed with the complete graph data are not scalable due to limited computation and memory costs.
textscSubg-Con is proposed by utilizing the strong correlation between central nodes and their sampled subgraphs to capture regional structure information.
Compared with existing graph representation learning approaches, textscSubg-Con has prominent performance advantages in weaker supervision requirements, model learning scalability, and parallelization.
arXiv Detail & Related papers (2020-09-22T01:58:19Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.