LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities
- URL: http://arxiv.org/abs/2007.15781v1
- Date: Fri, 31 Jul 2020 00:13:54 GMT
- Title: LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task
Activities
- Authors: Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-chun Zhu
- Abstract summary: We introduce the LEMMA dataset to provide a single home to address missing dimensions with meticulously designed settings.
We densely annotate the atomic-actions with human-object interactions to provide ground-truths of the compositionality, scheduling, and assignment of daily activities.
We hope this effort would drive the machine vision community to examine goal-directed human activities and further study the task scheduling and assignment in the real world.
- Score: 119.88381048477854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding and interpreting human actions is a long-standing challenge and
a critical indicator of perception in artificial intelligence. However, a few
imperative components of daily human activities are largely missed in prior
literature, including the goal-directed actions, concurrent multi-tasks, and
collaborations among multi-agents. We introduce the LEMMA dataset to provide a
single home to address these missing dimensions with meticulously designed
settings, wherein the number of tasks and agents varies to highlight different
learning objectives. We densely annotate the atomic-actions with human-object
interactions to provide ground-truths of the compositionality, scheduling, and
assignment of daily activities. We further devise challenging compositional
action recognition and action/task anticipation benchmarks with baseline models
to measure the capability of compositional action understanding and temporal
reasoning. We hope this effort would drive the machine vision community to
examine goal-directed human activities and further study the task scheduling
and assignment in the real world.
Related papers
- CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics [44.30880626337739]
We introduce Cooperative Human-Object Interaction (CooHOI), a novel framework that addresses multi-character objects transporting through a two-phase learning paradigm.
CooHOI is inherently efficient, does not depend on motion capture data of multi-character interactions, and can be seamlessly extended to include more participants.
arXiv Detail & Related papers (2024-06-20T17:59:22Z) - Multitask Multimodal Prompted Training for Interactive Embodied Task
Completion [48.69347134411864]
Embodied MultiModal Agent (EMMA) is a unified encoder-decoder model that reasons over images and trajectories.
By unifying all tasks as text generation, EMMA learns a language of actions which facilitates transfer across tasks.
arXiv Detail & Related papers (2023-11-07T15:27:52Z) - Continual Robot Learning using Self-Supervised Task Inference [19.635428830237842]
We propose a self-supervised task inference approach to continually learn new tasks.
We use a behavior-matching self-supervised learning objective to train a novel Task Inference Network (TINet)
A multi-task policy is built on top of the TINet and trained with reinforcement learning to optimize performance over tasks.
arXiv Detail & Related papers (2023-09-10T09:32:35Z) - Object-Centric Multi-Task Learning for Human Instances [8.035105819936808]
We explore a compact multi-task network architecture that maximally shares the parameters of the multiple tasks via object-centric learning.
We propose a novel query design to encode the human instance information effectively, called human-centric query (HCQ)
Experimental results show that the proposed multi-task network achieves comparable accuracy to state-of-the-art task-specific models.
arXiv Detail & Related papers (2023-03-13T01:10:50Z) - CompoSuite: A Compositional Reinforcement Learning Benchmark [20.89464587308586]
We present CompoSuite, an open-source benchmark for compositional multi-task reinforcement learning (RL)
Each CompoSuite task requires a particular robot arm to manipulate one individual object to achieve a task objective while avoiding an obstacle.
We benchmark existing single-task, multi-task, and compositional learning algorithms on various training settings, and assess their capability to compositionally generalize to unseen tasks.
arXiv Detail & Related papers (2022-07-08T22:01:52Z) - Exploring the Role of Task Transferability in Large-Scale Multi-Task
Learning [28.104054292437525]
We disentangle the effect of scale and relatedness of tasks in multi-task representation learning.
If the target tasks are known ahead of time, then training on a smaller set of related tasks is competitive to the large-scale multi-task training.
arXiv Detail & Related papers (2022-04-23T18:11:35Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z) - Towards More Generalizable One-shot Visual Imitation Learning [81.09074706236858]
A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences.
One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations.
We push for a higher level of generalization ability by investigating a more ambitious multi-task setup.
arXiv Detail & Related papers (2021-10-26T05:49:46Z) - Distribution Matching for Heterogeneous Multi-Task Learning: a
Large-scale Face Study [75.42182503265056]
Multi-Task Learning has emerged as a methodology in which multiple tasks are jointly learned by a shared learning algorithm.
We deal with heterogeneous MTL, simultaneously addressing detection, classification & regression problems.
We build FaceBehaviorNet, the first framework for large-scale face analysis, by jointly learning all facial behavior tasks.
arXiv Detail & Related papers (2021-05-08T22:26:52Z) - The IKEA ASM Dataset: Understanding People Assembling Furniture through
Actions, Objects and Pose [108.21037046507483]
IKEA ASM is a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human pose.
We benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset.
The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.
arXiv Detail & Related papers (2020-07-01T11:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.