Bridging Gap between Image Pixels and Semantics via Supervision: A
Survey
- URL: http://arxiv.org/abs/2107.13757v1
- Date: Thu, 29 Jul 2021 05:55:40 GMT
- Title: Bridging Gap between Image Pixels and Semantics via Supervision: A
Survey
- Authors: Jiali Duan, C.-C. Jay Kuo
- Abstract summary: The semantic gap between low-level features and semantic meanings of images is known for decades.
We claim that the semantic gap is primarily bridged through supervised learning today.
- Score: 34.3339977386633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fact that there exists a gap between low-level features and semantic
meanings of images, called the semantic gap, is known for decades. Resolution
of the semantic gap is a long standing problem. The semantic gap problem is
reviewed and a survey on recent efforts in bridging the gap is made in this
work. Most importantly, we claim that the semantic gap is primarily bridged
through supervised learning today. Experiences are drawn from two application
domains to illustrate this point: 1) object detection and 2) metric learning
for content-based image retrieval (CBIR). To begin with, this paper offers a
historical retrospective on supervision, makes a gradual transition to the
modern data-driven methodology and introduces commonly used datasets. Then, it
summarizes various supervision methods to bridge the semantic gap in the
context of object detection and metric learning.
Related papers
- Weakly-Supervised Semantic Segmentation with Image-Level Labels: from
Traditional Models to Foundation Models [33.690846523358836]
Weakly-supervised semantic segmentation (WSSS) is an effective solution to avoid pixel-level labels.
We focus on the WSSS with image-level labels, which is the most challenging form of WSSS.
We investigate the applicability of visual foundation models, such as the Segment Anything Model (SAM), in the context of WSSS.
arXiv Detail & Related papers (2023-10-19T07:16:54Z) - Shrinking the Semantic Gap: Spatial Pooling of Local Moment Invariants
for Copy-Move Forgery Detection [7.460203098159187]
Copy-move forgery is a manipulation of copying and pasting specific patches from and to an image, with potentially illegal or unethical uses.
Recent advances in the forensic methods for copy-move forgery have shown increasing success in detection accuracy and robustness.
For images with high self-similarity or strong signal corruption, the existing algorithms often exhibit inefficient processes and unreliable results.
arXiv Detail & Related papers (2022-07-19T09:11:43Z) - Content-Based Detection of Temporal Metadata Manipulation [91.34308819261905]
We propose an end-to-end approach to verify whether the purported time of capture of an image is consistent with its content and geographic location.
The central idea is the use of supervised consistency verification, in which we predict the probability that the image content, capture time, and geographical location are consistent.
Our approach improves upon previous work on a large benchmark dataset, increasing the classification accuracy from 59.03% to 81.07%.
arXiv Detail & Related papers (2021-03-08T13:16:19Z) - Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals [78.12377360145078]
We introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.
This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering.
In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU.
arXiv Detail & Related papers (2021-02-11T18:54:47Z) - Geography-Aware Self-Supervised Learning [79.4009241781968]
We show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks.
We propose novel training methods that exploit the spatially aligned structure of remote sensing data.
Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing.
arXiv Detail & Related papers (2020-11-19T17:29:13Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.