Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation
- URL: http://arxiv.org/abs/2107.14425v1
- Date: Fri, 30 Jul 2021 04:20:13 GMT
- Title: Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation
- Authors: Xiaotian Yu, Hanling Yi, Yi Yu, Ling Xing, Shiliang Zhang, Xiaoyu Wang
- Abstract summary: We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
- Score: 56.25878966006678
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been a recent surge of research interest in attacking the problem
of social relation inference based on images. Existing works classify social
relations mainly by creating complicated graphs of human interactions, or
learning the foreground and/or background information of persons and objects,
but ignore holistic scene context. The holistic scene refers to the
functionality of a place in images, such as dinning room, playground and
office. In this paper, by mimicking human understanding on images, we propose
an approach of \textbf{PR}actical \textbf{I}nference in \textbf{S}ocial
r\textbf{E}lation (PRISE), which concisely learns interactive features of
persons and discriminative features of holistic scenes. Technically, we develop
a simple and fast relational graph convolutional network to capture interactive
features of all persons in one image. To learn the holistic scene feature, we
elaborately design a contrastive learning task based on image scene
classification. To further boost the performance in social relation inference,
we collect and distribute a new large-scale dataset, which consists of about
240 thousand unlabeled images. The extensive experimental results show that our
novel learning framework significantly beats the state-of-the-art methods,
e.g., PRISE achieves 6.8$\%$ improvement for domain classification in PIPA
dataset.
Related papers
- Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models [0.65268245109828]
We introduce the notion of contextual diversity for active learning CDAL.
We propose a data repair algorithm to curate contextually fair data to reduce model bias.
We are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads.
arXiv Detail & Related papers (2024-11-04T09:43:33Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Fine-Grained Semantically Aligned Vision-Language Pre-Training [151.7372197904064]
Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks.
Existing methods mainly model the cross-modal alignment by the similarity of the global representations of images and texts.
We introduce LO, a fine-grained semantically aLigned visiOn-langUage PrE-training framework, which learns fine-grained semantic alignment from the novel perspective of game-theoretic interactions.
arXiv Detail & Related papers (2022-08-04T07:51:48Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Scenes and Surroundings: Scene Graph Generation using Relation
Transformer [13.146732454123326]
This work proposes a novel local-context aware architecture named relation transformer.
Our hierarchical multi-head attention-based approach efficiently captures contextual dependencies between objects and predicts their relationships.
In comparison to state-of-the-art approaches, we have achieved an overall mean textbf4.85% improvement.
arXiv Detail & Related papers (2021-07-12T14:22:20Z) - Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation [98.34909905511061]
We argue that a desirable scene graph should be hierarchically constructed, and introduce a new scheme for modeling scene graph.
To generate a scene graph based on HET, we parse HET with a Hybrid Long Short-Term Memory (Hybrid-LSTM) which specifically encodes hierarchy and siblings context.
To further prioritize key relations in the scene graph, we devise a Relation Ranking Module (RRM) to dynamically adjust their rankings.
arXiv Detail & Related papers (2020-07-17T05:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.