Learning Joint Representation of Human Motion and Language
- URL: http://arxiv.org/abs/2210.15187v1
- Date: Thu, 27 Oct 2022 05:32:20 GMT
- Title: Learning Joint Representation of Human Motion and Language
- Authors: Jihoon Kim, Youngjae Yu, Seungyoun Shin, Taehyun Byun, Sungjoon Choi
- Abstract summary: We present MoLang (a Motion-Language connecting model) for learning joint representation of human motion and language.
We propose a motion-language model with contrastive learning, empowering our model to learn better generalizable representations of the human motion domain.
- Score: 22.29342443400645
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work, we present MoLang (a Motion-Language connecting model) for
learning joint representation of human motion and language, leveraging both
unpaired and paired datasets of motion and language modalities. To this end, we
propose a motion-language model with contrastive learning, empowering our model
to learn better generalizable representations of the human motion domain.
Empirical results show that our model learns strong representations of human
motion data through navigating language modality. Our proposed method is able
to perform both action recognition and motion retrieval tasks with a single
model where it outperforms state-of-the-art approaches on a number of action
recognition benchmarks.
Related papers
- Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - A Grammatical Compositional Model for Video Action Detection [24.546886938243393]
We present a novel Grammatical Compositional Model (GCM) for action detection based on typical And-Or graphs.
Our model exploits the intrinsic structures and latent relationships of actions in a hierarchical manner to harness both the compositionality of grammar models and the capability of expressing rich features of DNNs.
arXiv Detail & Related papers (2023-10-04T15:24:00Z) - Fine-Tune Language Models as Multi-Modal Differential Equation Solvers [14.181842691371935]
We present a transformation of in-context operator learning into a multi-modal paradigm.
In particular, we take inspiration from the recent success of large language models, and propose using "captions" to integrate human knowledge about the operator.
arXiv Detail & Related papers (2023-08-09T16:44:25Z) - MotionGPT: Human Motion as a Foreign Language [47.21648303282788]
Human motion displays a semantic coupling akin to human language, often perceived as a form of body language.
By fusing language data with large-scale motion models, motion-language pre-training can enhance the performance of motion-related tasks.
We propose MotionGPT, a unified, versatile, and user-friendly motion-language model to handle multiple motion-relevant tasks.
arXiv Detail & Related papers (2023-06-26T15:53:02Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - TEMOS: Generating diverse human motions from textual descriptions [53.85978336198444]
We address the problem of generating diverse 3D human motions from textual descriptions.
We propose TEMOS, a text-conditioned generative model leveraging variational autoencoder (VAE) training with human motion data.
We show that TEMOS framework can produce both skeleton-based animations as in prior work, as well more expressive SMPL body motions.
arXiv Detail & Related papers (2022-04-25T14:53:06Z) - Context-Aware Language Modeling for Goal-Oriented Dialogue Systems [84.65707332816353]
We formulate goal-oriented dialogue as a partially observed Markov decision process.
We derive a simple and effective method to finetune language models in a goal-aware way.
We evaluate our method on a practical flight-booking task using AirDialogue.
arXiv Detail & Related papers (2022-04-18T17:23:11Z) - Towards Interactive Language Modeling [18.925337115380703]
Motivated by these considerations, we pioneer the space of interactive language modeling.
We present a road map in which we detail the steps that need to be taken towards interactive language modeling.
This work aims to be the start of a larger research agenda on interactive language modeling.
arXiv Detail & Related papers (2021-12-14T18:35:02Z) - Multi-agent Communication meets Natural Language: Synergies between
Functional and Structural Language Learning [16.776753238108036]
We present a method for combining multi-agent communication and traditional data-driven approaches to natural language learning.
Our starting point is a language model that has been trained on generic, not task-specific language data.
We then place this model in a multi-agent self-play environment that generates task-specific rewards used to adapt or modulate the model.
arXiv Detail & Related papers (2020-05-14T15:32:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.