Can machines learn to see without visual databases?
- URL: http://arxiv.org/abs/2110.05973v1
- Date: Tue, 12 Oct 2021 13:03:54 GMT
- Title: Can machines learn to see without visual databases?
- Authors: Alessandro Betti, Marco Gori, Stefano Melacci, Marcello Pelillo, Fabio
Roli
- Abstract summary: This paper focuses on developing machines that learn to see without needing to handle visual databases.
This might open the doors to a truly competitive track concerning deep learning technologies for vision.
- Score: 93.73109506642112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper sustains the position that the time has come for thinking of
learning machines that conquer visual skills in a truly human-like context,
where a few human-like object supervisions are given by vocal interactions and
pointing aids only. This likely requires new foundations on computational
processes of vision with the final purpose of involving machines in tasks of
visual description by living in their own visual environment under simple
man-machine linguistic interactions. The challenge consists of developing
machines that learn to see without needing to handle visual databases. This
might open the doors to a truly orthogonal competitive track concerning deep
learning technologies for vision which does not rely on the accumulation of
huge visual databases.
Related papers
- Improving Visual Perception of a Social Robot for Controlled and
In-the-wild Human-robot Interaction [10.260966795508569]
It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model.
We employ state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot.
arXiv Detail & Related papers (2024-03-04T06:47:06Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Deep Learning to See: Towards New Foundations of Computer Vision [88.69805848302266]
This book criticizes the supposed scientific progress in the field of computer vision.
It proposes the investigation of vision within the framework of information-based laws of nature.
arXiv Detail & Related papers (2022-06-30T15:20:36Z) - Visual Intelligence through Human Interaction [43.82765410550207]
We demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision.
We present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models.
Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable and grounded in psychophysics theory.
arXiv Detail & Related papers (2021-11-12T19:37:17Z) - Learning Visually Guided Latent Actions for Assistive Teleoperation [9.75385535829762]
We develop assistive robots that condition their latent embeddings on visual inputs.
We show that incorporating object detectors pretrained on small amounts of cheap, easy-to-collect structured data enables i) accurately recognizing the current context and ii) generalizing control embeddings to new objects and tasks.
arXiv Detail & Related papers (2021-05-02T23:58:28Z) - Multi-Modal Learning of Keypoint Predictive Models for Visual Object
Manipulation [6.853826783413853]
Humans have impressive generalization capabilities when it comes to manipulating objects in novel environments.
How to learn such body schemas for robots remains an open problem.
We develop an self-supervised approach that can extend a robot's kinematic model when grasping an object from visual latent representations.
arXiv Detail & Related papers (2020-11-08T01:04:59Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Advancing Visual Specification of Code Requirements for Graphs [0.0]
This paper focuses on producing meaningful visualizations of data using machine learning.
We allow the user to visually specify their code requirements in order to lower the barrier for humanities researchers to learn how to program visualizations.
We use a hybrid model, combining a neural network and optical character recognition to generate the code to create the visualization.
arXiv Detail & Related papers (2020-07-29T17:01:53Z) - Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike
Common Sense [142.53911271465344]
We argue that the next generation of AI must embrace "dark" humanlike common sense for solving novel tasks.
We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense.
arXiv Detail & Related papers (2020-04-20T04:07:28Z) - Vision and Language: from Visual Perception to Content Creation [100.36776435627962]
"vision to language" is probably one of the most popular topics in the past five years.
This paper reviews the recent advances along these two dimensions: "vision to language" and "language to vision"
arXiv Detail & Related papers (2019-12-26T14:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.