What can human minimal videos tell us about dynamic recognition models?
- URL: http://arxiv.org/abs/2104.09447v1
- Date: Mon, 19 Apr 2021 16:53:25 GMT
- Title: What can human minimal videos tell us about dynamic recognition models?
- Authors: Guy Ben-Yosef, Gabriel Kreiman, Shimon Ullman
- Abstract summary: In human vision objects and their parts can be visually recognized from purely spatial or purely temporal information.
We show that human visual recognition of objects and actions can be achieved by efficiently combining spatial and motion cues.
- Score: 14.201816626446888
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In human vision objects and their parts can be visually recognized from
purely spatial or purely temporal information but the mechanisms integrating
space and time are poorly understood. Here we show that human visual
recognition of objects and actions can be achieved by efficiently combining
spatial and motion cues in configurations where each source on its own is
insufficient for recognition. This analysis is obtained by identifying minimal
videos: these are short and tiny video clips in which objects, parts, and
actions can be reliably recognized, but any reduction in either space or time
makes them unrecognizable. State-of-the-art deep networks for dynamic visual
recognition cannot replicate human behavior in these configurations. This gap
between humans and machines points to critical mechanisms in human dynamic
vision that are lacking in current models.
Related papers
- EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Seeing Objects in a Cluttered World: Computational Objectness from
Motion in Video [0.0]
Perception of the visually disjoint surfaces of our world as whole objects physically distinct from those overlapping them forms the basis of our visual perception.
We present a simple but novel approach to infer objectness from phenomenology without object models.
We show that it delivers robust perception of individual attended objects in cluttered scenes, even with blur and camera shake.
arXiv Detail & Related papers (2024-02-02T03:57:11Z) - A Symbolic Representation of Human Posture for Interpretable Learning
and Reasoning [2.678461526933908]
We introduce a qualitative spatial reasoning approach that describes the human posture in terms that are more familiar to people.
This paper explores the derivation of our symbolic representation at two levels of detail and its preliminary use as features for interpretable activity recognition.
arXiv Detail & Related papers (2022-10-17T12:22:13Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - Weakly Supervised Human-Object Interaction Detection in Video via
Contrastive Spatiotemporal Regions [81.88294320397826]
A system does not know what human-object interactions are present in a video as or the actual location of the human and object.
We introduce a dataset comprising over 6.5k videos with human-object interaction that have been curated from sentence captions.
We demonstrate improved performance over weakly supervised baselines adapted to our annotations on our video dataset.
arXiv Detail & Related papers (2021-10-07T15:30:18Z) - From Movement Kinematics to Object Properties: Online Recognition of
Human Carefulness [112.28757246103099]
We show how a robot can infer online, from vision alone, whether or not the human partner is careful when moving an object.
We demonstrated that a humanoid robot could perform this inference with high accuracy (up to 81.3%) even with a low-resolution camera.
The prompt recognition of movement carefulness from observing the partner's action will allow robots to adapt their actions on the object to show the same degree of care as their human partners.
arXiv Detail & Related papers (2021-09-01T16:03:13Z) - Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene.
ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - View-invariant action recognition [3.553493344868414]
The varying pattern of a lot-temporal appearance generated by human action is key for identifying action performed.
The research in view-invariant action recognition addresses this problem on recognizing human actions from unseen viewpoints.
arXiv Detail & Related papers (2020-09-01T18:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.