MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations
of Behavior
- URL: http://arxiv.org/abs/2207.10553v2
- Date: Fri, 30 Jun 2023 22:45:47 GMT
- Title: MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations
of Behavior
- Authors: Jennifer J. Sun, Markus Marks, Andrew Ulmer, Dipam Chakraborty, Brian
Geuther, Edward Hayes, Heng Jia, Vivek Kumar, Sebastian Oleszko, Zachary
Partridge, Milan Peelman, Alice Robie, Catherine E. Schretter, Keith
Sheppard, Chao Sun, Param Uttarwar, Julian M. Wagner, Eric Werner, Joseph
Parker, Pietro Perona, Yisong Yue, Kristin Branson, Ann Kennedy
- Abstract summary: We introduce MABe22, a benchmark to assess the quality of learned behavior representations.
This dataset is collected from a variety of biology experiments.
We test self-supervised video and trajectory representation learning methods to demonstrate the use of our benchmark.
- Score: 28.878568752724235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce MABe22, a large-scale, multi-agent video and trajectory
benchmark to assess the quality of learned behavior representations. This
dataset is collected from a variety of biology experiments, and includes
triplets of interacting mice (4.7 million frames video+pose tracking data, 10
million frames pose only), symbiotic beetle-ant interactions (10 million frames
video data), and groups of interacting flies (4.4 million frames of pose
tracking data). Accompanying these data, we introduce a panel of real-life
downstream analysis tasks to assess the quality of learned representations by
evaluating how well they preserve information about the experimental conditions
(e.g. strain, time of day, optogenetic stimulation) and animal behavior. We
test multiple state-of-the-art self-supervised video and trajectory
representation learning methods to demonstrate the use of our benchmark,
revealing that methods developed using human action datasets do not fully
translate to animal datasets. We hope that our benchmark and dataset encourage
a broader exploration of behavior representation learning methods across
species and settings.
Related papers
- GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding [2.79453284883108]
This study evaluates the visual perception capabilities of multimodal large language models in animal activity recognition.
We found that while current multimodal LLMs require improvement in semantic correspondence and time perception, they have initially demonstrated visual perception capabilities for animal activity recognition.
arXiv Detail & Related papers (2024-06-14T07:30:26Z) - AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming [0.0]
We introduce a multimodal vision framework for precision livestock farming.
We harness the power of GroundingDINO, HQSAM, and ViTPose models.
This suite enables comprehensive behavioral analytics from video data without invasive animal tagging.
arXiv Detail & Related papers (2024-06-14T04:42:44Z) - From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave [0.0]
We introduce ChimpBehave, a novel dataset featuring over 2 hours of video (approximately 193,000 video frames) of zoo-housed chimpanzees.
ChimpBehave meticulously annotated with bounding boxes and behavior labels for action recognition.
We benchmark our dataset using a state-of-the-art CNN-based action recognition model.
arXiv Detail & Related papers (2024-05-30T13:11:08Z) - CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding.
Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects.
We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z) - OmniMotionGPT: Animal Motion Generation with Limited Data [70.35662376853163]
We introduce AnimalML3D, the first text-animal motion dataset with 1240 animation sequences spanning 36 different animal identities.
We are able to generate animal motions with high diversity and fidelity, quantitatively and qualitatively outperforming the results of training human motion generation baselines on animal data.
arXiv Detail & Related papers (2023-11-30T07:14:00Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Animal Kingdom: A Large and Diverse Dataset for Animal Behavior
Understanding [4.606145900630665]
We create a large and diverse dataset, Animal Kingdom, that provides multiple annotated tasks.
Our dataset contains 50 hours of annotated videos to localize relevant animal behavior segments.
We propose a Collaborative Action Recognition (CARe) model that learns general and specific features for action recognition with unseen new animals.
arXiv Detail & Related papers (2022-04-18T02:05:15Z) - PreViTS: Contrastive Pretraining with Video Tracking Supervision [53.73237606312024]
PreViTS is an unsupervised SSL framework for selecting clips containing the same object.
PreViTS spatially constrains the frame regions to learn from and trains the model to locate meaningful objects.
We train a momentum contrastive (MoCo) encoder on VGG-Sound and Kinetics-400 datasets with PreViTS.
arXiv Detail & Related papers (2021-12-01T19:49:57Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [62.265410865423]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - The Multi-Agent Behavior Dataset: Mouse Dyadic Social Interactions [39.265388879471686]
We present a multi-agent dataset from behavioral neuroscience, the Caltech Mouse Social Interactions (CalMS21) dataset.
Our dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay.
The CalMS21 dataset is part of the Multi-Agent Behavior Challenge 2021 and for our next step, our goal is to incorporate datasets from other domains studying multi-agent behavior.
arXiv Detail & Related papers (2021-04-06T17:58:47Z) - Comprehensive Instructional Video Analysis: The COIN Dataset and
Performance Evaluation [100.68317848808327]
We present a large-scale dataset named as "COIN" for COmprehensive INstructional video analysis.
COIN dataset contains 11,827 videos of 180 tasks in 12 domains related to our daily life.
With a new developed toolbox, all the videos are annotated efficiently with a series of step labels and the corresponding temporal boundaries.
arXiv Detail & Related papers (2020-03-20T16:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.