Learning Bidirectional Translation between Descriptions and Actions with
Small Paired Data
- URL: http://arxiv.org/abs/2203.04218v1
- Date: Tue, 8 Mar 2022 17:39:16 GMT
- Title: Learning Bidirectional Translation between Descriptions and Actions with
Small Paired Data
- Authors: Minori Toyoda, Kanata Suzuki, Yoshihiko Hayashi, Tetsuya Ogata
- Abstract summary: This study proposes a two-stage training method for bidirectional translation.
We train recurrent autoencoders (RAEs) for descriptions and actions with a large amount of non-paired data.
Then, we fine-tune the entire model to bind their intermediate representations using small paired data.
- Score: 5.188295416244741
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study achieved bidirectional translation between descriptions and
actions using small paired data. The ability to mutually generate descriptions
and actions is essential for robots to collaborate with humans in their daily
lives. The robot is required to associate real-world objects with linguistic
expressions, and large-scale paired data are required for machine learning
approaches. However, a paired dataset is expensive to construct and difficult
to collect. This study proposes a two-stage training method for bidirectional
translation. In the proposed method, we train recurrent autoencoders (RAEs) for
descriptions and actions with a large amount of non-paired data. Then, we
fine-tune the entire model to bind their intermediate representations using
small paired data. Because the data used for pre-training do not require
pairing, behavior-only data or a large language corpus can be used. We
experimentally evaluated our method using a paired dataset consisting of
motion-captured actions and descriptions. The results showed that our method
performed well, even when the amount of paired data to train was small. The
visualization of the intermediate representations of each RAE showed that
similar actions were encoded in a clustered position and the corresponding
feature vectors well aligned.
Related papers
- DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Semi-Supervised Image Captioning by Adversarially Propagating Labeled
Data [95.0476489266988]
We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models.
Our proposed method trains a captioner to learn from a paired data and to progressively associate unpaired data.
Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired dataset.
arXiv Detail & Related papers (2023-01-26T15:25:43Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Embracing Structure in Data for Billion-Scale Semantic Product Search [14.962039276966319]
We present principled approaches to train and deploy dyadic neural embedding models at the billion scale.
We show that exploiting the natural structure of real-world datasets helps address both challenges efficiently.
arXiv Detail & Related papers (2021-10-12T16:14:13Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Recognition and Processing of NATOM [0.0]
This paper shows how to process the NOTAM (Notice to Airmen) data of the field in civil aviation.
For the original data of the NOTAM, there is a mixture of Chinese and English, and the structure is poor.
Using Glove word vector methods to represent the data for using a custom mapping vocabulary.
arXiv Detail & Related papers (2021-04-29T10:12:00Z) - Learning to Match Jobs with Resumes from Sparse Interaction Data using
Multi-View Co-Teaching Network [83.64416937454801]
Job-resume interaction data is sparse and noisy, which affects the performance of job-resume match algorithms.
We propose a novel multi-view co-teaching network from sparse interaction data for job-resume matching.
Our model is able to outperform state-of-the-art methods for job-resume matching.
arXiv Detail & Related papers (2020-09-25T03:09:54Z) - Leverage Unlabeled Data for Abstractive Speech Summarization with
Self-Supervised Learning and Back-Summarization [6.465251961564605]
Supervised approaches for Neural Abstractive Summarization require large annotated corpora that are costly to build.
We present a French meeting summarization task where reports are predicted based on the automatic transcription of the meeting audio recordings.
We report large improvements compared to the previous baseline for both approaches on two evaluation sets.
arXiv Detail & Related papers (2020-07-30T08:22:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.