Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given
Enough Time
- URL: http://arxiv.org/abs/2402.03973v1
- Date: Tue, 6 Feb 2024 13:06:14 GMT
- Title: Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given
Enough Time
- Authors: Netta Ollikka, Amro Abbas, Andrea Perin, Markku Kilpel\"ainen,
St\'ephane Deny
- Abstract summary: Humans excel at recognizing objects in unusual poses, in contrast with state-of-the-art pretrained networks.
As we limit image exposure time, human performance degrades to the level of deep networks.
Even time-limited humans are dissimilar to feed-forward deep networks.
- Score: 1.6874375111244329
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning is closing the gap with humans on several object recognition
benchmarks. Here we investigate this gap in the context of challenging images
where objects are seen from unusual viewpoints. We find that humans excel at
recognizing objects in unusual poses, in contrast with state-of-the-art
pretrained networks (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) which are
systematically brittle in this condition. Remarkably, as we limit image
exposure time, human performance degrades to the level of deep networks,
suggesting that additional mental processes (requiring additional time) take
place when humans identify objects in unusual poses. Finally, our analysis of
error patterns of humans vs. networks reveals that even time-limited humans are
dissimilar to feed-forward deep networks. We conclude that more work is needed
to bring computer vision systems to the level of robustness of the human visual
system. Understanding the nature of the mental processes taking place during
extra viewing time may be key to attain such robustness.
Related papers
- Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Scene-aware Egocentric 3D Human Pose Estimation [72.57527706631964]
Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality.
Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene.
We propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints.
arXiv Detail & Related papers (2022-12-20T21:35:39Z) - A Brief Survey on Person Recognition at a Distance [46.47338660858037]
Person recognition at a distance entails recognizing the identity of an individual appearing in images or videos collected by long-range imaging systems such as drones or surveillance cameras.
Despite recent advances in deep convolutional neural networks (DCNNs), this remains challenging.
arXiv Detail & Related papers (2022-12-17T22:15:10Z) - Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises [7.689542442882423]
We designed a dual-stream vision model inspired by the human brain.
This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation.
We evaluated this model against various benchmarks in terms of object recognition, gaze behavior and adversarial robustness.
arXiv Detail & Related papers (2022-06-15T03:44:42Z) - Embodied vision for learning object representations [4.211128681972148]
We show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments.
We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions.
arXiv Detail & Related papers (2022-05-12T16:36:27Z) - Robustness of Humans and Machines on Object Recognition with Extreme
Image Transformations [0.0]
We introduce a novel set of image transforms and evaluate humans and networks on an object recognition task.
We found performance for a few common networks quickly decreases while humans are able to recognize objects with a high accuracy.
arXiv Detail & Related papers (2022-05-09T17:15:54Z) - Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D
Shape, Pose, and Appearance Consistency [55.94908688207493]
We propose a self-supervised framework named SPICE that closes the image quality gap with supervised methods.
The key insight enabling self-supervision is to exploit 3D information about the human body in several ways.
SPICE achieves state-of-the-art performance on the DeepFashion dataset.
arXiv Detail & Related papers (2021-10-11T17:48:50Z) - Continuous Emotion Recognition with Spatiotemporal Convolutional Neural
Networks [82.54695985117783]
We investigate the suitability of state-of-the-art deep learning architectures for continuous emotion recognition using long video sequences captured in-the-wild.
We have developed and evaluated convolutional recurrent neural networks combining 2D-CNNs and long short term-memory units, and inflated 3D-CNN models, which are built by inflating the weights of a pre-trained 2D-CNN model during fine-tuning.
arXiv Detail & Related papers (2020-11-18T13:42:05Z) - Perceiving 3D Human-Object Spatial Arrangements from a Single Image in
the Wild [96.08358373137438]
We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene.
Our method runs on datasets without any scene- or object-level 3D supervision.
arXiv Detail & Related papers (2020-07-30T17:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.