Motion2Language, unsupervised learning of synchronized semantic motion
segmentation
- URL: http://arxiv.org/abs/2310.10594v2
- Date: Wed, 13 Dec 2023 17:29:15 GMT
- Title: Motion2Language, unsupervised learning of synchronized semantic motion
segmentation
- Authors: Karim Radouane, Andon Tchechmedjiev, Julien Lagarde, Sylvie Ranwez
- Abstract summary: We investigate building a sequence to sequence architecture for motion to language translation and synchronization.
The aim is to translate motion capture inputs into English natural-language descriptions, such that the descriptions are generated synchronously with the actions performed.
We propose a new recurrent formulation of local attention that is suited for synchronous/live text generation, as well as an improved motion encoder architecture.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate building a sequence to sequence architecture
for motion to language translation and synchronization. The aim is to translate
motion capture inputs into English natural-language descriptions, such that the
descriptions are generated synchronously with the actions performed, enabling
semantic segmentation as a byproduct, but without requiring synchronized
training data. We propose a new recurrent formulation of local attention that
is suited for synchronous/live text generation, as well as an improved motion
encoder architecture better suited to smaller data and for synchronous
generation. We evaluate both contributions in individual experiments, using the
standard BLEU4 metric, as well as a simple semantic equivalence measure, on the
KIT motion language dataset. In a follow-up experiment, we assess the quality
of the synchronization of generated text in our proposed approaches through
multiple evaluation metrics. We find that both contributions to the attention
mechanism and the encoder architecture additively improve the quality of
generated text (BLEU and semantic equivalence), but also of synchronization.
Our code is available at
https://github.com/rd20karim/M2T-Segmentation/tree/main
Related papers
- Transformer with Controlled Attention for Synchronous Motion Captioning [0.0]
This paper addresses a challenging task, synchronous motion captioning, that aim to generate a language description synchronized with human motion sequences.
Our method introduces mechanisms to control self- and cross-attention distributions of the Transformer, allowing interpretability and time-aligned text generation.
We demonstrate the superior performance of our approach through evaluation on the two available benchmark datasets, KIT-ML and HumanML3D.
arXiv Detail & Related papers (2024-09-13T20:30:29Z) - An Automatic Quality Metric for Evaluating Simultaneous Interpretation [13.009481258370702]
Simultaneous interpretation (SI) starts translation before the original speech has finished.
We propose an automatic evaluation metric for SI and simultaneous machine translation (SiMT) focusing on word order synchronization.
arXiv Detail & Related papers (2024-07-09T08:21:40Z) - Sequence Shortening for Context-Aware Machine Translation [5.803309695504831]
We show that a special case of multi-encoder architecture achieves higher accuracy on contrastive datasets.
We introduce two novel methods - Latent Grouping and Latent Selecting, where the network learns to group tokens or selects the tokens to be cached as context.
arXiv Detail & Related papers (2024-02-02T13:55:37Z) - Synchformer: Efficient Synchronization from Sparse Cues [100.89656994681934]
Our contributions include a novel audio-visual synchronization model, and training that decouples extraction from synchronization modelling.
This approach achieves state-of-the-art performance in both dense and sparse settings.
We also extend synchronization model training to AudioSet a million-scale 'in-the-wild' dataset, investigate evidence attribution techniques for interpretability, and explore a new capability for synchronization models: audio-visual synchronizability.
arXiv Detail & Related papers (2024-01-29T18:59:55Z) - SemanticBoost: Elevating Motion Generation with Augmented Textual Cues [73.83255805408126]
Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD)
The CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences.
Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques.
arXiv Detail & Related papers (2023-10-31T09:58:11Z) - Leveraging Timestamp Information for Serialized Joint Streaming
Recognition and Translation [51.399695200838586]
We propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder.
Experiments on it,es,de->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.
arXiv Detail & Related papers (2023-10-23T11:00:27Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion
Data and Natural Language [4.86658723641864]
We propose a novel text-to-motion retrieval task, which aims at retrieving relevant motions based on a specified natural description.
Inspired by the recent progress in text-to-image/video matching, we experiment with two widely-adopted metric-learning loss functions.
arXiv Detail & Related papers (2023-05-25T08:32:41Z) - Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios.
We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence.
In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z) - Bilingual Synchronization: Restoring Translational Relationships with
Editing Operations [2.0411082897313984]
We consider a more general setting which assumes an initial target sequence, that must be transformed into a valid translation of the source.
Our results suggest that one single generic edit-based system, once fine-tuned, can compare with, or even outperform, dedicated systems specifically trained for these tasks.
arXiv Detail & Related papers (2022-10-24T12:25:44Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.