Representation Learning of Image Schema
- URL: http://arxiv.org/abs/2207.08256v1
- Date: Sun, 17 Jul 2022 18:42:37 GMT
- Title: Representation Learning of Image Schema
- Authors: Fajrian Yunus, Chlo\'e Clavel, Catherine Pelachaud
- Abstract summary: We propose a technique to learn the vector representation of image schemas.
Our main goal is to generate metaphoric gestures for an Embodied Conversational Agent.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image schema is a recurrent pattern of reasoning where one entity is mapped
into another. Image schema is similar to conceptual metaphor and is also
related to metaphoric gesture. Our main goal is to generate metaphoric gestures
for an Embodied Conversational Agent.
We propose a technique to learn the vector representation of image schemas.
As far as we are aware of, this is the first work which addresses that problem.
Our technique uses Ravenet et al's algorithm which we use to compute the image
schemas from the text input and also BERT and SenseBERT which we use as the
base word embedding technique to calculate the final vector representation of
the image schema. Our representation learning technique works by clustering:
word embedding vectors which belong to the same image schema should be
relatively closer to each other, and thus form a cluster.
With the image schemas representable as vectors, it also becomes possible to
have a notion that some image schemas are closer or more similar to each other
than to the others because the distance between the vectors is a proxy of the
dissimilarity between the corresponding image schemas. Therefore, after
obtaining the vector representation of the image schemas, we calculate the
distances between those vectors. Based on these, we create visualizations to
illustrate the relative distances between the different image schemas.
Related papers
- Using Images to Find Context-Independent Word Representations in Vector Space [3.2634122554914002]
We propose a novel method of using dictionary meanings and image depictions to find word vectors independent of any context.
Our method performs comparably to context-based methods while taking much less training time.
arXiv Detail & Related papers (2024-11-28T08:44:10Z) - Patch-wise Graph Contrastive Learning for Image Translation [69.85040887753729]
We exploit the graph neural network to capture the topology-aware features.
We construct the graph based on the patch-wise similarity from a pretrained encoder.
In order to capture the hierarchical semantic structure, we propose the graph pooling.
arXiv Detail & Related papers (2023-12-13T15:45:19Z) - STAIR: Learning Sparse Text and Image Representation in Grounded Tokens [84.14528645941128]
We show that it is possible to build a sparse semantic representation that is as powerful as, or even better than, dense presentations.
We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.
It significantly outperforms a CLIP model with +$4.9%$ and +$4.3%$ absolute Recall@1 improvement.
arXiv Detail & Related papers (2023-01-30T17:21:30Z) - DSI2I: Dense Style for Unpaired Image-to-Image Translation [70.93865212275412]
Unpaired exemplar-based image-to-image (UEI2I) translation aims to translate a source image to a target image domain with the style of a target image exemplar.
We propose to represent style as a dense feature map, allowing for a finer-grained transfer to the source image without requiring any external semantic information.
Our results show that the translations produced by our approach are more diverse, preserve the source content better, and are closer to the exemplars when compared to the state-of-the-art methods.
arXiv Detail & Related papers (2022-12-26T18:45:25Z) - Target-oriented Sentiment Classification with Sequential Cross-modal
Semantic Graph [27.77392307623526]
Multi-modal aspect-based sentiment classification (MABSC) is task of classifying the sentiment of a target entity mentioned in a sentence and an image.
Previous methods failed to account for the fine-grained semantic association between the image and the text.
We propose a new approach called SeqCSG, which enhances the encoder-decoder sentiment classification framework using sequential cross-modal semantic graphs.
arXiv Detail & Related papers (2022-08-19T16:04:29Z) - Image-to-Image Retrieval by Learning Similarity between Scene Graphs [5.284353899197193]
We propose a novel approach for image-to-image retrieval using scene graph similarity measured by graph neural networks.
In our approach, graph neural networks are trained to predict the proxy image relevance measure, computed from human-annotated captions.
arXiv Detail & Related papers (2020-12-29T10:45:20Z) - Embedding Words in Non-Vector Space with Unsupervised Graph Learning [33.51809615505692]
We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end.
In our setting, each word is a node in a weighted graph and the distance between words is the shortest path distance between the corresponding nodes.
We show that our graph-based representations substantially outperform vector-based methods on word similarity and analogy tasks.
arXiv Detail & Related papers (2020-10-06T10:17:49Z) - Cross-domain Correspondence Learning for Exemplar-based Image
Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain.
The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z) - Structural-analogy from a Single Image Pair [118.61885732829117]
In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
We generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A.
Our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only.
arXiv Detail & Related papers (2020-04-05T14:51:10Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.