Action sequencing using visual permutations
- URL: http://arxiv.org/abs/2008.01156v2
- Date: Fri, 5 Feb 2021 02:34:31 GMT
- Title: Action sequencing using visual permutations
- Authors: Michael Burke, Kartic Subr, Subramanian Ramamoorthy
- Abstract summary: This work considers the task of neural action sequencing conditioned on a single reference visual state.
This paper takes a permutation perspective and argues that action sequencing benefits from the ability to reason about both permutations and ordering concepts.
- Score: 19.583283039057505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans can easily reason about the sequence of high level actions needed to
complete tasks, but it is particularly difficult to instil this ability in
robots trained from relatively few examples. This work considers the task of
neural action sequencing conditioned on a single reference visual state. This
task is extremely challenging as it is not only subject to the significant
combinatorial complexity that arises from large action sets, but also requires
a model that can perform some form of symbol grounding, mapping high
dimensional input data to actions, while reasoning about action relationships.
This paper takes a permutation perspective and argues that action sequencing
benefits from the ability to reason about both permutations and ordering
concepts. Empirical analysis shows that neural models trained with latent
permutations outperform standard neural architectures in constrained action
sequencing tasks. Results also show that action sequencing using visual
permutations is an effective mechanism to initialise and speed up traditional
planning techniques and successfully scales to far greater action set sizes
than models considered previously.
Related papers
- SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation [62.58480650443393]
Segment Anything (SAM) is a vision-foundation model for generalizable scene understanding and sequence imitation.
We develop a novel multi-channel heatmap that enables the prediction of the action sequence in a single pass.
arXiv Detail & Related papers (2024-05-30T00:32:51Z) - Continuous-time convolutions model of event sequences [46.3471121117337]
Event sequences are non-uniform and sparse, making traditional models unsuitable.
We propose COTIC, a method based on an efficient convolution neural network designed to handle the non-uniform occurrence of events over time.
COTIC outperforms existing models in predicting the next event time and type, achieving an average rank of 1.5 compared to 3.714 for the nearest competitor.
arXiv Detail & Related papers (2023-02-13T10:34:51Z) - Editing Models with Task Arithmetic [69.97273155842966]
Changing how pre-trained models behave is a common practice when developing machine learning systems.
We build task vectors by subtracting the weights of a pre-trained model from the weights of the same model after fine-tuning on a task.
We show that these task vectors can be modified and combined together through arithmetic operations such as negation and addition.
arXiv Detail & Related papers (2022-12-08T05:50:53Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z) - Consequence-aware Sequential Counterfactual Generation [5.71097144710995]
We propose a model-agnostic method for sequential counterfactual generation.
Our approach generates less costly solutions, is more efficient, and provides the user with a diverse set of solutions to choose from.
arXiv Detail & Related papers (2021-04-12T16:10:03Z) - Efficient and Interpretable Robot Manipulation with Graph Neural
Networks [7.799182201815763]
We represent manipulation tasks as operations over graphs, using graph neural networks (GNNs)
Our formulation first transforms the environment into a graph representation, then applies a trained GNN policy to predict which object to manipulate towards which goal state.
Our GNN policies are trained using very few expert demonstrations on simple tasks, and exhibits generalization over number and configurations of objects in the environment.
We present experiments which show that a single learned GNN policy can solve a variety of blockstacking tasks in both simulation and real hardware.
arXiv Detail & Related papers (2021-02-25T21:09:12Z) - Few-shot Sequence Learning with Transformers [79.87875859408955]
Few-shot algorithms aim at learning new tasks provided only a handful of training examples.
In this work we investigate few-shot learning in the setting where the data points are sequences of tokens.
We propose an efficient learning algorithm based on Transformers.
arXiv Detail & Related papers (2020-12-17T12:30:38Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - Deep Visual Reasoning: Learning to Predict Action Sequences for Task and
Motion Planning from an Initial Scene Image [43.05971157389743]
We propose a deep convolutional recurrent neural network that predicts action sequences for task and motion planning (TAMP) from an initial scene image.
A key aspect is that our method generalizes to scenes with many and varying number of objects, although being trained on only two objects at a time.
arXiv Detail & Related papers (2020-06-09T16:52:02Z) - Inferring Temporal Compositions of Actions Using Probabilistic Automata [61.09176771931052]
We propose to express temporal compositions of actions as semantic regular expressions and derive an inference framework using probabilistic automata.
Our approach is different from existing works that either predict long-range complex activities as unordered sets of atomic actions, or retrieve videos using natural language sentences.
arXiv Detail & Related papers (2020-04-28T00:15:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.