META4: Semantically-Aligned Generation of Metaphoric Gestures Using
Self-Supervised Text and Speech Representation
- URL: http://arxiv.org/abs/2311.05481v2
- Date: Tue, 21 Nov 2023 10:26:29 GMT
- Title: META4: Semantically-Aligned Generation of Metaphoric Gestures Using
Self-Supervised Text and Speech Representation
- Authors: Mireille Fares, Catherine Pelachaud, Nicolas Obin
- Abstract summary: We introduce META4, a deep learning approach that generates metaphoric gestures from both speech and Images.
Our approach has two primary goals: computing Images from input text to capture the underlying semantic and metaphorical meaning, and generating metaphoric gestures driven by speech and the computed image schemas.
- Score: 2.7317088388886384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image Schemas are repetitive cognitive patterns that influence the way we
conceptualize and reason about various concepts present in speech. These
patterns are deeply embedded within our cognitive processes and are reflected
in our bodily expressions including gestures. Particularly, metaphoric gestures
possess essential characteristics and semantic meanings that align with Image
Schemas, to visually represent abstract concepts. The shape and form of
gestures can convey abstract concepts, such as extending the forearm and hand
or tracing a line with hand movements to visually represent the image schema of
PATH. Previous behavior generation models have primarily focused on utilizing
speech (acoustic features and text) to drive the generation model of virtual
agents. They have not considered key semantic information as those carried by
Image Schemas to effectively generate metaphoric gestures. To address this
limitation, we introduce META4, a deep learning approach that generates
metaphoric gestures from both speech and Image Schemas. Our approach has two
primary goals: computing Image Schemas from input text to capture the
underlying semantic and metaphorical meaning, and generating metaphoric
gestures driven by speech and the computed image schemas. Our approach is the
first method for generating speech driven metaphoric gestures while leveraging
the potential of Image Schemas. We demonstrate the effectiveness of our
approach and highlight the importance of both speech and image schemas in
modeling metaphoric gestures.
Related papers
- Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs.
We propose Compositional Entailment Learning for hyperbolic vision-language models.
Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z) - Text-to-Image Generation for Abstract Concepts [76.32278151607763]
We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
arXiv Detail & Related papers (2023-09-26T02:22:39Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create
Visual Metaphors [38.70166865926743]
We propose a new task of generating visual metaphors from linguistic metaphors.
This is a challenging task for diffusion-based text-to-image models, since it requires the ability to model implicit meaning and compositionality.
We create a high-quality dataset containing 6,476 visual metaphors for 1,540 linguistic metaphors and their associated visual elaborations.
arXiv Detail & Related papers (2023-05-24T05:01:10Z) - MetaCLUE: Towards Comprehensive Visual Metaphors Research [43.604408485890275]
We introduce MetaCLUE, a set of vision tasks on visual metaphor.
We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations.
We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.
arXiv Detail & Related papers (2022-12-19T22:41:46Z) - Cross-Modal Alignment Learning of Vision-Language Conceptual Systems [24.423011687551433]
We propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms.
The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks.
arXiv Detail & Related papers (2022-07-31T08:39:53Z) - Emergent Graphical Conventions in a Visual Communication Game [80.79297387339614]
Humans communicate with graphical sketches apart from symbolic languages.
We take the very first step to model and simulate such an evolution process via two neural agents playing a visual communication game.
We devise a novel reinforcement learning method such that agents are evolved jointly towards successful communication and abstract graphical conventions.
arXiv Detail & Related papers (2021-11-28T18:59:57Z) - Toward a Visual Concept Vocabulary for GAN Latent Space [74.12447538049537]
This paper introduces a new method for building open-ended vocabularies of primitive visual concepts represented in a GAN's latent space.
Our approach is built from three components: automatic identification of perceptually salient directions based on their layer selectivity; human annotation of these directions with free-form, compositional natural language descriptions.
Experiments show that concepts learned with our approach are reliable and composable -- generalizing across classes, contexts, and observers.
arXiv Detail & Related papers (2021-10-08T17:58:19Z) - Metaphor Generation with Conceptual Mappings [58.61307123799594]
We aim to generate a metaphoric sentence given a literal expression by replacing relevant verbs.
We propose to control the generation process by encoding conceptual mappings between cognitive domains.
We show that the unsupervised CM-Lex model is competitive with recent deep learning metaphor generation systems.
arXiv Detail & Related papers (2021-06-02T15:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.