Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams
- URL: http://arxiv.org/abs/2204.12193v1
- Date: Tue, 26 Apr 2022 09:52:31 GMT
- Title: Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams
- Authors: Matteo Tiezzi, Simone Marullo, Lapo Faggi, Enrico Meloni, Alessandro
Betti and Stefano Melacci
- Abstract summary: This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
- Score: 64.82800502603138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Devising intelligent agents able to live in an environment and learn by
observing the surroundings is a longstanding goal of Artificial Intelligence.
From a bare Machine Learning perspective, challenges arise when the agent is
prevented from leveraging large fully-annotated dataset, but rather the
interactions with supervisory signals are sparsely distributed over space and
time. This paper proposes a novel neural-network-based approach to
progressively and autonomously develop pixel-wise representations in a video
stream. The proposed method is based on a human-like attention mechanism that
allows the agent to learn by observing what is moving in the attended
locations. Spatio-temporal stochastic coherence along the attention trajectory,
paired with a contrastive term, leads to an unsupervised learning criterion
that naturally copes with the considered setting. Differently from most
existing works, the learned representations are used in open-set
class-incremental classification of each frame pixel, relying on few
supervisions. Our experiments leverage 3D virtual environments and they show
that the proposed agents can learn to distinguish objects just by observing the
video stream. Inheriting features from state-of-the art models is not as
powerful as one might expect.
Related papers
- Continual Learning of Conjugated Visual Representations through Higher-order Motion Flows [21.17248975377718]
Learning with neural networks presents several challenges due to the non-i.i.d. nature of the data.
It also offers novel opportunities to develop representations that are consistent with the information flow.
In this paper we investigate the case of unsupervised continual learning of pixel-wise features subject to multiple motion-induced constraints.
arXiv Detail & Related papers (2024-09-16T19:08:32Z) - Incorporating simulated spatial context information improves the effectiveness of contrastive learning models [1.4179832037924995]
We present a unique approach, termed Environmental Spatial Similarity (ESS), that complements existing contrastive learning methods.
ESS allows remarkable proficiency in room classification and spatial prediction tasks, especially in unfamiliar environments.
Potentially transformative applications span from robotics to space exploration.
arXiv Detail & Related papers (2024-01-26T03:44:58Z) - Adversarial Imitation Learning from Visual Observations using Latent Information [9.240917262195046]
We focus on the problem of imitation learning from visual observations, where the learning agent has access to videos of experts as its sole learning source.
We introduce an algorithm called Latent Adversarial from Observations, which combines off-policy adversarial imitation techniques with a learned latent representation of the agent's state from sequences of observations.
In experiments on high-dimensional continuous robotic tasks, we show that our model-free approach in latent space matches state-of-the-art performance.
arXiv Detail & Related papers (2023-09-29T16:20:36Z) - Point Contrastive Prediction with Semantic Clustering for
Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data.
Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z) - Palm up: Playing in the Latent Manifold for Unsupervised Pretraining [31.92145741769497]
We propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.
Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space.
We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data.
arXiv Detail & Related papers (2022-10-19T22:26:12Z) - Robust and Controllable Object-Centric Learning through Energy-based
Models [95.68748828339059]
ours is a conceptually simple and general approach to learning object-centric representations through an energy-based model.
We show that ours can be easily integrated into existing architectures and can effectively extract high-quality object-centric representations.
arXiv Detail & Related papers (2022-10-11T15:11:15Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z) - Trajectory annotation using sequences of spatial perception [0.0]
In the near future, more and more machines will perform tasks in the vicinity of human spaces.
This work builds a foundation to address this task.
We propose an unsupervised learning approach based on a neural autoencoding that learns semantically meaningful continuous encodings of prototypical trajectory data.
arXiv Detail & Related papers (2020-04-11T12:22:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.