Learning Behavior Representations Through Multi-Timescale Bootstrapping
- URL: http://arxiv.org/abs/2206.07041v1
- Date: Tue, 14 Jun 2022 17:57:55 GMT
- Title: Learning Behavior Representations Through Multi-Timescale Bootstrapping
- Authors: Mehdi Azabou, Michael Mendelson, Maks Sorokin, Shantanu Thakoor,
Nauman Ahad, Carolina Urzay, Eva L. Dyer
- Abstract summary: We introduce Bootstrap Across Multiple Scales (BAMS), a multi-scale representation learning model for behavior.
We first apply our method on a dataset of quadrupeds navigating in different terrain types, and show that our model captures the temporal complexity of behavior.
- Score: 8.543808476554695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural behavior consists of dynamics that are both unpredictable, can switch
suddenly, and unfold over many different timescales. While some success has
been found in building representations of behavior under constrained or
simplified task-based conditions, many of these models cannot be applied to
free and naturalistic settings due to the fact that they assume a single scale
of temporal dynamics. In this work, we introduce Bootstrap Across Multiple
Scales (BAMS), a multi-scale representation learning model for behavior: we
combine a pooling module that aggregates features extracted over encoders with
different temporal receptive fields, and design a set of latent objectives to
bootstrap the representations in each respective space to encourage
disentanglement across different timescales. We first apply our method on a
dataset of quadrupeds navigating in different terrain types, and show that our
model captures the temporal complexity of behavior. We then apply our method to
the MABe 2022 Multi-agent behavior challenge, where our model ranks 3rd overall
and 1st on two subtasks, and show the importance of incorporating
multi-timescales when analyzing behavior.
Related papers
- Semantic-Guided Multimodal Sentiment Decoding with Adversarial Temporal-Invariant Learning [22.54577327204281]
Multimodal sentiment analysis aims to learn representations from different modalities to identify human emotions.
Existing works often neglect the frame-level redundancy inherent in continuous time series, resulting in incomplete modality representations with noise.
We propose temporal-invariant learning for the first time, which constrains the distributional variations over time steps to effectively capture long-term temporal dynamics.
arXiv Detail & Related papers (2024-08-30T03:28:40Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent [53.637837706712794]
We propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs.
Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction.
We benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation.
arXiv Detail & Related papers (2024-05-27T22:15:23Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - Relax, it doesn't matter how you get there: A new self-supervised
approach for multi-timescale behavior analysis [8.543808476554695]
We develop a multi-task representation learning model for behavior that combines two novel components.
Our model ranks 1st overall and on all global tasks, and 1st or 2nd on 7 out of 9 frame-level tasks.
arXiv Detail & Related papers (2023-03-15T17:58:48Z) - A Multi-view Multi-task Learning Framework for Multi-variate Time Series
Forecasting [42.061275727906256]
We propose a novel multi-view multi-task (MVMT) learning framework for MTS forecasting.
MVMT information is deeply concealed in the MTS data, which severely hinders the model from capturing it naturally.
We develop two kinds of basic operations, namely task-wise affine transformation and task-wise normalization, respectively.
arXiv Detail & Related papers (2021-09-02T06:11:26Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.