Self-Supervised Keypoint Discovery in Behavioral Videos
- URL: http://arxiv.org/abs/2112.05121v1
- Date: Thu, 9 Dec 2021 18:55:53 GMT
- Title: Self-Supervised Keypoint Discovery in Behavioral Videos
- Authors: Jennifer J. Sun, Serim Ryou, Roni Goldshmid, Brandon Weissbourd, John
Dabiri, David J. Anderson, Ann Kennedy, Yisong Yue, Pietro Perona
- Abstract summary: We propose a method for learning the posture and structure of agents from unlabelled behavioral videos.
Our method uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the difference between video frames.
By focusing only on regions of movement, our approach works directly on input videos without requiring manual annotations.
- Score: 37.367739727481016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for learning the posture and structure of agents from
unlabelled behavioral videos. Starting from the observation that behaving
agents are generally the main sources of movement in behavioral videos, our
method uses an encoder-decoder architecture with a geometric bottleneck to
reconstruct the difference between video frames. By focusing only on regions of
movement, our approach works directly on input videos without requiring manual
annotations, such as keypoints or bounding boxes. Experiments on a variety of
agent types (mouse, fly, human, jellyfish, and trees) demonstrate the
generality of our approach and reveal that our discovered keypoints represent
semantically meaningful body parts, which achieve state-of-the-art performance
on keypoint regression among self-supervised methods. Additionally, our
discovered keypoints achieve comparable performance to supervised keypoints on
downstream tasks, such as behavior classification, suggesting that our method
can dramatically reduce the cost of model training vis-a-vis supervised
methods.
Related papers
- Learning Keypoints for Multi-Agent Behavior Analysis using Self-Supervision [15.308050177798453]
B-KinD-multi is a novel approach that leverages pre-trained video segmentation models to guide keypoint discovery in multi-agent scenarios.
Extensive evaluations demonstrate improved keypoint regression and downstream behavioral classification in videos of flies, mice, and rats.
Our method generalizes well to other species, including ants, bees, and humans.
arXiv Detail & Related papers (2024-09-14T14:46:44Z) - LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and
Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks.
We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Learning Actor-centered Representations for Action Localization in
Streaming Videos using Predictive Learning [18.757368441841123]
Event perception tasks such as recognizing and localizing actions in streaming videos are essential for tackling visual understanding tasks.
We tackle the problem of learning textitactor-centered representations through the notion of continual hierarchical predictive learning.
Inspired by cognitive theories of event perception, we propose a novel, self-supervised framework.
arXiv Detail & Related papers (2021-04-29T06:06:58Z) - Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues.
We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background.
We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z) - Learning by Watching: Physical Imitation of Manipulation Skills from
Human Videos [28.712673809577076]
We present an approach for physical imitation from human videos for robot manipulation tasks.
We design a perception module that learns to translate human videos to the robot domain followed by unsupervised keypoint detection.
We evaluate the effectiveness of our approach on five robot manipulation tasks, including reaching, pushing, sliding, coffee making, and drawer closing.
arXiv Detail & Related papers (2021-01-18T18:50:32Z) - Self-supervised Segmentation via Background Inpainting [96.10971980098196]
We introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera.
We exploit a self-supervised loss function that we exploit to train a proposal-based segmentation network.
We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
arXiv Detail & Related papers (2020-11-11T08:34:40Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.