Comparing Visual Reasoning in Humans and AI
- URL: http://arxiv.org/abs/2104.14102v1
- Date: Thu, 29 Apr 2021 04:44:13 GMT
- Title: Comparing Visual Reasoning in Humans and AI
- Authors: Shravan Murlidaran, William Yang Wang, Miguel P. Eckstein
- Abstract summary: We created a dataset of complex scenes that contained human behaviors and social interactions.
We used a quantitative metric of similarity between scene descriptions of the AI/human and ground truth of five other human descriptions of each scene.
Results show that the machine/human agreement scene descriptions are much lower than human/human agreement for our complex scenes.
- Score: 66.89451296340809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in natural language processing and computer vision have led
to AI models that interpret simple scenes at human levels. Yet, we do not have
a complete understanding of how humans and AI models differ in their
interpretation of more complex scenes. We created a dataset of complex scenes
that contained human behaviors and social interactions. AI and humans had to
describe the scenes with a sentence. We used a quantitative metric of
similarity between scene descriptions of the AI/human and ground truth of five
other human descriptions of each scene. Results show that the machine/human
agreement scene descriptions are much lower than human/human agreement for our
complex scenes. Using an experimental manipulation that occludes different
spatial regions of the scenes, we assessed how machines and humans vary in
utilizing regions of images to understand the scenes. Together, our results are
a first step toward understanding how machines fall short of human visual
reasoning with complex scenes depicting human behaviors.
Related papers
- Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - Putting People in Their Place: Affordance-Aware Human Insertion into
Scenes [61.63825003487104]
We study the problem of inferring scene affordances by presenting a method for realistically inserting people into scenes.
Given a scene image with a marked region and an image of a person, we insert the person into the scene while respecting the scene affordances.
Our model can infer the set of realistic poses given the scene context, re-pose the reference person, and harmonize the composition.
arXiv Detail & Related papers (2023-04-27T17:59:58Z) - Compositional 3D Human-Object Neural Animation [93.38239238988719]
Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics.
In this paper, we address this challenge in HOI animation from a compositional perspective.
We adopt neural human-object deformation to model and render HOI dynamics based on implicit neural representations.
arXiv Detail & Related papers (2023-04-27T10:04:56Z) - Everyone Can Be Picasso? A Computational Framework into the Myth of
Human versus AI Painting [8.031314357134795]
We develop a computational framework combining neural latent space and aesthetics features with visual analytics to investigate the difference between human and AI paintings.
We find that AI artworks show distributional difference from human artworks in both latent space and some aesthetic features like strokes and sharpness.
Our findings provide concrete evidence for the existing discrepancies between human and AI paintings and further suggest improvements of AI art with more consideration of aesthetics and human artists' involvement.
arXiv Detail & Related papers (2023-04-17T05:48:59Z) - Human-Art: A Versatile Human-Centric Dataset Bridging Natural and
Artificial Scenes [15.48297730981114]
We introduce the Human-Art dataset to bridge related tasks in natural and artificial scenarios.
Human-Art contains 50k high-quality images with over 123k person instances from 5 natural and 15 artificial scenarios.
We also provide a rich set of baseline results and detailed analyses for related tasks, including human detection, 2D and 3D human pose estimation, image generation, and motion transfer.
arXiv Detail & Related papers (2023-03-05T20:05:21Z) - Scene Synthesis from Human Motion [26.2618553074691]
We propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion.
Our framework, Scene Synthesis from HUMan MotiON (MONSUM), includes two steps.
It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion.
arXiv Detail & Related papers (2023-01-04T03:30:46Z) - HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes [54.61610144668777]
We present a novel scene-and-language conditioned generative model that can produce 3D human motions in 3D scenes.
Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
arXiv Detail & Related papers (2022-10-18T10:14:11Z) - Stochastic Scene-Aware Motion Prediction [41.6104600038666]
We present a novel data-driven, synthesis motion method that models different styles of performing a given action with a target object.
Our method, called SAMP, for SceneAware Motion Prediction, generalizes to target objects of various geometries while enabling the character to navigate in cluttered scenes.
arXiv Detail & Related papers (2021-08-18T17:56:17Z) - PLACE: Proximity Learning of Articulation and Contact in 3D Environments [70.50782687884839]
We propose a novel interaction generation method, named PLACE, which explicitly models the proximity between the human body and the 3D scene around it.
Our perceptual study shows that PLACE significantly improves the state-of-the-art method, approaching the realism of real human-scene interaction.
arXiv Detail & Related papers (2020-08-12T21:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.