What Is Considered Complete for Visual Recognition?
- URL: http://arxiv.org/abs/2105.13978v1
- Date: Fri, 28 May 2021 16:59:14 GMT
- Title: What Is Considered Complete for Visual Recognition?
- Authors: Lingxi Xie, Xiaopeng Zhang, Longhui Wei, Jianlong Chang, Qi Tian
- Abstract summary: We advocate for a new type of pre-training task named learning-by-compression.
The computational models are optimized to represent the visual data using compact features.
Semantic annotations, when available, play the role of weak supervision.
- Score: 110.43159801737222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This is an opinion paper. We hope to deliver a key message that current
visual recognition systems are far from complete, i.e., recognizing everything
that human can recognize, yet it is very unlikely that the gap can be bridged
by continuously increasing human annotations. Based on the observation, we
advocate for a new type of pre-training task named learning-by-compression. The
computational models (e.g., a deep network) are optimized to represent the
visual data using compact features, and the features preserve the ability to
recover the original data. Semantic annotations, when available, play the role
of weak supervision. An important yet challenging issue is the evaluation of
image recovery, where we suggest some design principles and future research
directions. We hope our proposal can inspire the community to pursue the
compression-recovery tradeoff rather than the accuracy-complexity tradeoff.
Related papers
- Exploring the Evolution of Hidden Activations with Live-Update Visualization [12.377279207342735]
We introduce SentryCam, an automated, real-time visualization tool that reveals the progression of hidden representations during training.
Our results show that this visualization offers a more comprehensive view of the learning dynamics compared to basic metrics.
SentryCam could facilitate detailed analysis such as task transfer and catastrophic forgetting to a continual learning setting.
arXiv Detail & Related papers (2024-05-24T01:23:20Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Textual Prompt Guided Image Restoration [18.78902053873706]
"All-in-one" models that can do blind image restoration have been concerned in recent years.
Recent works focus on learning visual prompts from data distribution to identify degradation type.
In this paper, an effective textual prompt guided image restoration model has been proposed.
arXiv Detail & Related papers (2023-12-11T06:56:41Z) - Does Visual Pretraining Help End-to-End Reasoning? [81.4707017038019]
We investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks.
We propose a simple and general self-supervised framework which "compresses" each video frame into a small set of tokens.
We observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning.
arXiv Detail & Related papers (2023-07-17T14:08:38Z) - Understanding the Effect of the Long Tail on Neural Network Compression [9.819486253052528]
We study the "long tail" phenomenon in computer vision datasets observed by Feldman, et al.
As compression limits the capacity of a network (and hence also its ability to memorize), we study the question: are mismatches between the full and compressed models correlated with the memorized training data?
arXiv Detail & Related papers (2023-06-09T20:18:05Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - Feature Forgetting in Continual Representation Learning [48.89340526235304]
representations do not suffer from "catastrophic forgetting" even in plain continual learning, but little further fact is known about its characteristics.
We devise a protocol for evaluating representation in continual learning, and then use it to present an overview of the basic trends of continual representation learning.
To study the feature forgetting problem, we create a synthetic dataset to identify and visualize the prevalence of feature forgetting in neural networks.
arXiv Detail & Related papers (2022-05-26T13:38:56Z) - Learning to Prompt for Vision-Language Models [82.25005817904027]
Vision-language pre-training has emerged as a promising alternative for representation learning.
It shifts from the tradition of using images and discrete labels for learning a fixed set of weights, seen as visual concepts, to aligning images and raw text for two separate encoders.
Such a paradigm benefits from a broader source of supervision and allows zero-shot transfer to downstream tasks.
arXiv Detail & Related papers (2021-09-02T17:57:31Z) - Evaluating the Progress of Deep Learning for Visual Relational Concepts [0.6999740786886536]
We will show that difficult tasks are linked to relational concepts from cognitive psychology.
We will review research that is linked to relational concept learning, even if it was not originally presented from this angle.
We will recommend steps to make future datasets more relevant for testing systems on relational reasoning.
arXiv Detail & Related papers (2020-01-29T14:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.