Rethinking Batch Sample Relationships for Data Representation: A
Batch-Graph Transformer based Approach
- URL: http://arxiv.org/abs/2211.10622v1
- Date: Sat, 19 Nov 2022 08:46:50 GMT
- Title: Rethinking Batch Sample Relationships for Data Representation: A
Batch-Graph Transformer based Approach
- Authors: Xixi Wang, Bo Jiang, Xiao Wang, Bin Luo
- Abstract summary: We design a simple yet flexible Batch-Graph Transformer (BGFormer) for mini-batch sample representations.
It deeply captures the relationships of image samples from both visual and semantic perspectives.
Extensive experiments on four popular datasets demonstrate the effectiveness of the proposed model.
- Score: 16.757917001089762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploring sample relationships within each mini-batch has shown great
potential for learning image representations. Existing works generally adopt
the regular Transformer to model the visual content relationships, ignoring the
cues of semantic/label correlations between samples. Also, they generally adopt
the "full" self-attention mechanism which are obviously redundant and also
sensitive to the noisy samples. To overcome these issues, in this paper, we
design a simple yet flexible Batch-Graph Transformer (BGFormer) for mini-batch
sample representations by deeply capturing the relationships of image samples
from both visual and semantic perspectives. BGFormer has three main aspects.
(1) It employs a flexible graph model, termed Batch Graph to jointly encode the
visual and semantic relationships of samples within each mini-batch. (2) It
explores the neighborhood relationships of samples by borrowing the idea of
sparse graph representation which thus performs robustly, w.r.t., noisy
samples. (3) It devises a novel Transformer architecture that mainly adopts
dual structure-constrained self-attention (SSA), together with graph
normalization, FFN, etc, to carefully exploit the batch graph information for
sample tokens (nodes) representations. As an application, we apply BGFormer to
the metric learning tasks. Extensive experiments on four popular datasets
demonstrate the effectiveness of the proposed model.
Related papers
- DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation [13.058196732927135]
Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image.
Existing Transformer-based methods either employ distinct queries for objects and predicates or utilize holistic queries for relation triplets.
We present a new Transformer-based method, called DSGG, that views scene graph detection as a direct graph prediction problem.
arXiv Detail & Related papers (2024-03-21T23:43:30Z) - On the Equivalence of Graph Convolution and Mixup [70.0121263465133]
This paper investigates the relationship between graph convolution and Mixup techniques.
Under two mild conditions, graph convolution can be viewed as a specialized form of Mixup.
We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup.
arXiv Detail & Related papers (2023-09-29T23:09:54Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - GVdoc: Graph-based Visual Document Classification [17.350393956461783]
We propose GVdoc, a graph-based document classification model.
Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings.
We show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data.
arXiv Detail & Related papers (2023-05-26T19:23:20Z) - MSVQ: Self-Supervised Learning with Multiple Sample Views and Queues [10.327408694770709]
We propose a new simple framework, namely Multiple Sample Views and Queues (MSVQ)
We jointly construct three soft labels on-the-fly by utilizing two complementary and symmetric approaches.
Let the student network mimic the similarity relationships between the samples, thus giving the student network a more flexible ability to identify false negative samples in the dataset.
arXiv Detail & Related papers (2023-05-09T12:05:14Z) - Iterative Scene Graph Generation with Generative Transformers [6.243995448840211]
Scene graphs provide a rich, structured representation of a scene by encoding the entities (objects) and their spatial relationships in a graphical format.
Current approaches take a generation-by-classification approach where the scene graph is generated through labeling of all possible edges between objects in a scene.
This work introduces a generative transformer-based approach to generating scene graphs beyond link prediction.
arXiv Detail & Related papers (2022-11-30T00:05:44Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - ACTIVE:Augmentation-Free Graph Contrastive Learning for Partial
Multi-View Clustering [52.491074276133325]
We propose an augmentation-free graph contrastive learning framework to solve the problem of partial multi-view clustering.
The proposed approach elevates instance-level contrastive learning and missing data inference to the cluster-level, effectively mitigating the impact of individual missing data on clustering.
arXiv Detail & Related papers (2022-03-01T02:32:25Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.