SHERLock: Self-Supervised Hierarchical Event Representation Learning
- URL: http://arxiv.org/abs/2010.02556v2
- Date: Mon, 22 Aug 2022 18:14:34 GMT
- Title: SHERLock: Self-Supervised Hierarchical Event Representation Learning
- Authors: Sumegh Roychowdhury, Sumedh A. Sontakke, Nikaash Puri, Mausoom Sarkar,
Milan Aggarwal, Pinkesh Badjatiya, Balaji Krishnamurthy, Laurent Itti
- Abstract summary: We propose a model that learns temporal representations from long-horizon visual demonstration data.
Our method produces a hierarchy of representations that align more closely with ground-truth human-annotated events.
- Score: 22.19386609894017
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Temporal event representations are an essential aspect of learning among
humans. They allow for succinct encoding of the experiences we have through a
variety of sensory inputs. Also, they are believed to be arranged
hierarchically, allowing for an efficient representation of complex
long-horizon experiences. Additionally, these representations are acquired in a
self-supervised manner. Analogously, here we propose a model that learns
temporal representations from long-horizon visual demonstration data and
associated textual descriptions, without explicit temporal supervision. Our
method produces a hierarchy of representations that align more closely with
ground-truth human-annotated events (+15.3) than state-of-the-art unsupervised
baselines.
Our results are comparable to heavily-supervised baselines in complex visual
domains such as Chess Openings, YouCook2 and TutorialVQA datasets. Finally, we
perform ablation studies illustrating the robustness of our approach. We
release our code and demo visualizations in the Supplementary Material.
Related papers
- Universal Time-Series Representation Learning: A Survey [14.340399848964662]
Time-series data exists in every corner of real-world systems and services.
Deep learning has demonstrated remarkable performance in extracting hidden patterns and features from time-series data.
arXiv Detail & Related papers (2024-01-08T08:00:04Z) - Unsupervised Representation Learning for Time Series: A Review [20.00853543048447]
Unsupervised representation learning approaches aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample.
We conduct a literature review of existing rapidly evolving unsupervised representation learning approaches for time series.
We empirically evaluate state-of-the-art approaches, especially the rapidly evolving contrastive learning methods, on 9 diverse real-world datasets.
arXiv Detail & Related papers (2023-08-03T07:28:06Z) - On the Generalization of Learned Structured Representations [5.1398743023989555]
We study methods that learn, with little or no supervision, representations of unstructured data that capture its hidden structure.
The second part of this thesis focuses on object-centric representations, which capture the compositional structure of the input in terms of symbol-like entities.
arXiv Detail & Related papers (2023-04-25T17:14:36Z) - OPERA: Omni-Supervised Representation Learning with Hierarchical
Supervisions [94.31804364707575]
We propose Omni-suPErvised Representation leArning with hierarchical supervisions (OPERA) as a solution.
We extract a set of hierarchical proxy representations for each image and impose self and full supervisions on the corresponding proxy representations.
Experiments on both convolutional neural networks and vision transformers demonstrate the superiority of OPERA in image classification, segmentation, and object detection.
arXiv Detail & Related papers (2022-10-11T15:51:31Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video
Representation [16.643709221279764]
We propose a novel pretext task -temporal overlap rate (STOR) prediction.
It stems from observation that humans are capable of discriminating overlap rates of videos in space and time.
We employ a joint task combining contrastive learning to further the enhance-temporal representation learning.
arXiv Detail & Related papers (2021-12-16T14:31:22Z) - Interpretable Time-series Representation Learning With Multi-Level
Disentanglement [56.38489708031278]
Disentangle Time Series (DTS) is a novel disentanglement enhancement framework for sequential data.
DTS generates hierarchical semantic concepts as the interpretable and disentangled representation of time-series.
DTS achieves superior performance in downstream applications, with high interpretability of semantic concepts.
arXiv Detail & Related papers (2021-05-17T22:02:24Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Laplacian Denoising Autoencoder [114.21219514831343]
We propose to learn data representations with a novel type of denoising autoencoder.
The noisy input data is generated by corrupting latent clean data in the gradient domain.
Experiments on several visual benchmarks demonstrate that better representations can be learned with the proposed approach.
arXiv Detail & Related papers (2020-03-30T16:52:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.