Visual Intelligence through Human Interaction
- URL: http://arxiv.org/abs/2111.06913v1
- Date: Fri, 12 Nov 2021 19:37:17 GMT
- Title: Visual Intelligence through Human Interaction
- Authors: Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein
- Abstract summary: We demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision.
We present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models.
Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable and grounded in psychophysics theory.
- Score: 43.82765410550207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the last decade, Computer Vision, the branch of Artificial Intelligence
aimed at understanding the visual world, has evolved from simply recognizing
objects in images to describing pictures, answering questions about images,
aiding robots maneuver around physical spaces and even generating novel visual
content. As these tasks and applications have modernized, so too has the
reliance on more data, either for model training or for evaluation. In this
chapter, we demonstrate that novel interaction strategies can enable new forms
of data collection and evaluation for Computer Vision. First, we present a
crowdsourcing interface for speeding up paid data collection by an order of
magnitude, feeding the data-hungry nature of modern vision models. Second, we
explore a method to increase volunteer contributions using automated social
interventions. Third, we develop a system to ensure human evaluation of
generative vision models are reliable, affordable and grounded in psychophysics
theory. We conclude with future opportunities for Human-Computer Interaction to
aid Computer Vision.
Related papers
- Improving Visual Perception of a Social Robot for Controlled and
In-the-wild Human-robot Interaction [10.260966795508569]
It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model.
We employ state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot.
arXiv Detail & Related papers (2024-03-04T06:47:06Z) - AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs.
This paper explores the innovative concept of harnessing these AI-generated images as new data sources.
In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z) - InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [52.981128371910266]
We present InstructDiffusion, a framework for aligning computer vision tasks with human instructions.
InstructDiffusion could handle a variety of vision tasks, including understanding tasks and generative tasks.
It even exhibits the ability to handle unseen tasks and outperforms prior methods on novel datasets.
arXiv Detail & Related papers (2023-09-07T17:56:57Z) - Procedural Humans for Computer Vision [1.9550079119934403]
We build a parametric model of the face and body, including articulated hands, to generate realistic images of humans based on this body model.
We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety.
arXiv Detail & Related papers (2023-01-03T15:44:48Z) - Can machines learn to see without visual databases? [93.73109506642112]
This paper focuses on developing machines that learn to see without needing to handle visual databases.
This might open the doors to a truly competitive track concerning deep learning technologies for vision.
arXiv Detail & Related papers (2021-10-12T13:03:54Z) - Visual Perspective Taking for Opponent Behavior Modeling [22.69165968663182]
We propose an end-to-end long-term visual prediction framework for robots.
We demonstrate our approach in the context of visual hide-and-seek.
We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into real-world multi-agent activities.
arXiv Detail & Related papers (2021-05-11T16:02:32Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - State of the Art on Neural Rendering [141.22760314536438]
We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs.
This report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence.
arXiv Detail & Related papers (2020-04-08T04:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.