Do Trajectories Encode Verb Meaning?
- URL: http://arxiv.org/abs/2206.11953v1
- Date: Thu, 23 Jun 2022 19:57:16 GMT
- Title: Do Trajectories Encode Verb Meaning?
- Authors: Dylan Ebert, Chen Sun, Ellie Pavlick
- Abstract summary: Grounded language models learn to connect concrete categories like nouns and adjectives to the world via images and videos.
In this paper, we investigate the extent to which trajectories (i.e. the position and rotation of objects over time) naturally encode verb semantics.
We find that trajectories correlate as-is with some verbs (e.g., fall), and that additional abstraction via self-supervised pretraining can further capture nuanced differences in verb meaning.
- Score: 22.409307683247967
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributional models learn representations of words from text, but are
criticized for their lack of grounding, or the linking of text to the
non-linguistic world. Grounded language models have had success in learning to
connect concrete categories like nouns and adjectives to the world via images
and videos, but can struggle to isolate the meaning of the verbs themselves
from the context in which they typically occur. In this paper, we investigate
the extent to which trajectories (i.e. the position and rotation of objects
over time) naturally encode verb semantics. We build a procedurally generated
agent-object-interaction dataset, obtain human annotations for the verbs that
occur in this data, and compare several methods for representation learning
given the trajectories. We find that trajectories correlate as-is with some
verbs (e.g., fall), and that additional abstraction via self-supervised
pretraining can further capture nuanced differences in verb meaning (e.g., roll
vs. slide).
Related papers
- Skill Generalization with Verbs [20.90116318432194]
It is imperative that robots can understand natural language commands issued by humans.
We propose a method for generalizing manipulation skills to novel objects using verbs.
We show that our model can generate trajectories that are usable for executing five verb commands applied to novel instances of two different object categories on a real robot.
arXiv Detail & Related papers (2024-10-18T02:12:18Z) - Verbs in Action: Improving verb understanding in video-language models [128.87443209118726]
State-of-the-art video-language models based on CLIP have been shown to have limited verb understanding.
We improve verb understanding for CLIP-based video-language models by proposing a new Verb-Focused Contrastive framework.
arXiv Detail & Related papers (2023-04-13T17:57:01Z) - GSRFormer: Grounded Situation Recognition Transformer with Alternate
Semantic Attention Refinement [73.73599110214828]
Grounded Situation Recognition (GSR) aims to generate structured semantic summaries of images for human-like'' event understanding.
Inspired by object detection and image captioning tasks, existing methods typically employ a two-stage framework.
We propose a novel two-stage framework that focuses on utilizing such bidirectional relations within verbs and roles.
arXiv Detail & Related papers (2022-08-18T17:13:59Z) - Disentangled Action Recognition with Knowledge Bases [77.77482846456478]
We aim to improve the generalization ability of the compositional action recognition model to novel verbs or novel nouns.
Previous work utilizes verb-noun compositional action nodes in the knowledge graph, making it inefficient to scale.
We propose our approach: Disentangled Action Recognition with Knowledge-bases (DARK), which leverages the inherent compositionality of actions.
arXiv Detail & Related papers (2022-07-04T20:19:13Z) - Grounding Spatio-Temporal Language with Transformers [22.46291815734606]
We introduce a novel-temporal language task to learn the meaning of behavioral traces of an embodied agent.
This is achieved by training a function that predicts if a description matches a given history of observations.
To study the role of architectural generalization in this task, we train several models including multimodal Transformer architectures.
arXiv Detail & Related papers (2021-06-16T15:28:22Z) - Verb Knowledge Injection for Multilingual Event Processing [50.27826310460763]
We investigate whether injecting explicit information on verbs' semantic-syntactic behaviour improves the performance of LM-pretrained Transformers.
We first demonstrate that injecting verb knowledge leads to performance gains in English event extraction.
We then explore the utility of verb adapters for event extraction in other languages.
arXiv Detail & Related papers (2020-12-31T03:24:34Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - COBE: Contextualized Object Embeddings from Narrated Instructional Video [52.73710465010274]
We propose a new framework for learning Contextualized OBject Embeddings from automatically-transcribed narrations of instructional videos.
We leverage the semantic and compositional structure of language by training a visual detector to predict a contextualized word embedding of the object and its associated narration.
Our experiments show that our detector learns to predict a rich variety of contextual object information, and that it is highly effective in the settings of few-shot and zero-shot learning.
arXiv Detail & Related papers (2020-07-14T19:04:08Z) - Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning [29.181547214915238]
We show that an attacker can control the "meaning" of new and existing words by changing their locations in the embedding space.
An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios.
arXiv Detail & Related papers (2020-01-14T17:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.