Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
Generation
- URL: http://arxiv.org/abs/2306.10122v1
- Date: Fri, 16 Jun 2023 18:14:23 GMT
- Title: Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
Generation
- Authors: Shuo Chen, Yingjun Du, Pascal Mettes, Cees G.M. Snoek
- Abstract summary: Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature.
Recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes.
We introduce a multi-label meta-learning framework to deal with the biased predicate distribution.
- Score: 55.429541407920304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the problem of scene graph generation in videos with
the aim of capturing semantic relations between subjects and objects in the
form of $\langle$subject, predicate, object$\rangle$ triplets. Recognizing the
predicate between subject and object pairs is imbalanced and multi-label in
nature, ranging from ubiquitous interactions such as spatial relationships (\eg
\emph{in front of}) to rare interactions such as \emph{twisting}. In
widely-used benchmarks such as Action Genome and VidOR, the imbalance ratio
between the most and least frequent predicates reaches 3,218 and 3,408,
respectively, surpassing even benchmarks specifically designed for long-tailed
recognition. Due to the long-tailed distributions and label co-occurrences,
recent state-of-the-art methods predominantly focus on the most frequently
occurring predicate classes, ignoring those in the long tail. In this paper, we
analyze the limitations of current approaches for scene graph generation in
videos and identify a one-to-one correspondence between predicate frequency and
recall performance. To make the step towards unbiased scene graph generation in
videos, we introduce a multi-label meta-learning framework to deal with the
biased predicate distribution. Our meta-learning framework learns a meta-weight
network for each training sample over all possible label losses. We evaluate
our approach on the Action Genome and VidOR benchmarks by building upon two
current state-of-the-art methods for each benchmark. The experiments
demonstrate that the multi-label meta-weight network improves the performance
for predicates in the long tail without compromising performance for head
classes, resulting in better overall performance and favorable
generalizability. Code: \url{https://github.com/shanshuo/ML-MWN}.
Related papers
- Unbiased Scene Graph Generation using Predicate Similarities [7.9112365100345965]
Scene Graphs are widely applied in computer vision as a graphical representation of relationships between objects shown in images.
These applications have not yet reached a practical stage of development owing to biased training caused by long-tailed predicate distributions.
We propose a new classification scheme that branches the process to several fine-grained classifiers for similar predicate groups.
The results of extensive experiments on the Visual Genome dataset show that the combination of our method and an existing debiasing approach greatly improves performance on tail predicates in challenging SGCls/SGDet tasks.
arXiv Detail & Related papers (2022-10-03T13:28:01Z) - FAITH: Few-Shot Graph Classification with Hierarchical Task Graphs [39.576675425158754]
Few-shot graph classification aims at predicting classes for graphs, given limited labeled graphs for each class.
We propose a novel few-shot learning framework FAITH that captures task correlations via constructing a hierarchical task graph.
Experiments on four prevalent few-shot graph classification datasets demonstrate the superiority of FAITH over other state-of-the-art baselines.
arXiv Detail & Related papers (2022-05-05T04:28:32Z) - Cross-Domain Few-Shot Graph Classification [7.23389716633927]
We study the problem of few-shot graph classification across domains with nonequivalent feature spaces.
We propose an attention-based graph encoder that uses three congruent views of graphs, one contextual and two topological views.
We show that when coupled with metric-based meta-learning frameworks, the proposed encoder achieves the best average meta-test classification accuracy.
arXiv Detail & Related papers (2022-01-20T16:16:30Z) - A Graph-Based Neural Model for End-to-End Frame Semantic Parsing [12.43480002133656]
We propose an end-to-end neural model to tackle the frame semantic parsing task jointly.
We exploit a graph-based method, regarding frame semantic parsing as a graph construction problem.
Experiment results on two benchmark datasets of frame semantic parsing show that our method is highly competitive.
arXiv Detail & Related papers (2021-09-25T08:54:33Z) - Semantic Compositional Learning for Low-shot Scene Graph Generation [122.51930904132685]
Many scene graph generation (SGG) models solely use the limited annotated relation triples for training.
We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples.
For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
arXiv Detail & Related papers (2021-08-19T10:13:55Z) - Dual ResGCN for Balanced Scene GraphGeneration [106.7828712878278]
We propose a novel model, dubbed textitdual ResGCN, which consists of an object residual graph convolutional network and a relation residual graph convolutional network.
The two networks are complementary to each other. The former captures object-level context information, textiti.e., the connections among objects.
The latter is carefully designed to explicitly capture relation-level context information textiti.e., the connections among relations.
arXiv Detail & Related papers (2020-11-09T07:44:17Z) - Addressing Class Imbalance in Scene Graph Parsing by Learning to
Contrast and Score [65.18522219013786]
Scene graph parsing aims to detect objects in an image scene and recognize their relations.
Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations.
This paper introduces a novel integrated framework of classification and ranking to resolve the class imbalance problem.
arXiv Detail & Related papers (2020-09-28T13:57:59Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Generalized Few-Shot Video Classification with Video Retrieval and
Feature Generation [132.82884193921535]
We argue that previous methods underestimate the importance of video feature learning and propose a two-stage approach.
We show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks.
We present two novel approaches that yield further improvement.
arXiv Detail & Related papers (2020-07-09T13:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.