Related papers: From images in the wild to video-informed image classification

From images in the wild to video-informed image classification

URL: http://arxiv.org/abs/2109.12040v1
Date: Fri, 24 Sep 2021 15:53:37 GMT
Title: From images in the wild to video-informed image classification
Authors: Marc B\"ohlen, Varun Chandola, Wawan Sujarwo, Raunaq Jain
Abstract summary: This paper describes experiments applying state-of-the-art object classifiers toward a unique set of images in the wild with high visual complexity collected on the island of Bali. The text describes differences between actual images in the wild and images from Imagenet, and then discusses a novel approach combining informational cues particular to video with an ensemble of imperfect classifiers in order to improve classification results on video sourced images of plants in the wild.
Score: 0.7804710977378488
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Image classifiers work effectively when applied on structured images, yet they often fail when applied on images with very high visual complexity. This paper describes experiments applying state-of-the-art object classifiers toward a unique set of images in the wild with high visual complexity collected on the island of Bali. The text describes differences between actual images in the wild and images from Imagenet, and then discusses a novel approach combining informational cues particular to video with an ensemble of imperfect classifiers in order to improve classification results on video sourced images of plants in the wild.

Related papers

Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning. Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z)
Image Captioners Sometimes Tell More Than Images They See [8.640488282016351]
Image captioning, a.k.a. "image-to-text," generates descriptive text from given images. We have performed experiments involving the classification of images from descriptive text alone. We have evaluated several image captioning models with respect to a disaster image classification task, CrisisNLP.
arXiv Detail & Related papers (2023-05-04T15:32:41Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
Multi-modal reward for visual relationships-based image captioning [4.354364351426983]
This paper proposes a deep neural network architecture for image captioning based on fusing the visual relationships information extracted from an image's scene graph with the spatial feature maps of the image. A multi-modal reward function is then introduced for deep reinforcement learning of the proposed network using a combination of language and vision similarities in a common embedding space.
arXiv Detail & Related papers (2023-03-19T20:52:44Z)
Arbitrary Style Transfer with Structure Enhancement by Combining the Global and Local Loss [51.309905690367835]
We introduce a novel arbitrary style transfer method with structure enhancement by combining the global and local loss. Experimental results demonstrate that our method can generate higher-quality images with impressive visual effects.
arXiv Detail & Related papers (2022-07-23T07:02:57Z)
Iconographic Image Captioning for Artworks [2.3859169601259342]
This work utilizes a novel large-scale dataset of artwork images annotated with concepts from the Iconclass classification system designed for art and iconography. The annotations are processed into clean textual description to create a dataset suitable for training a deep neural network model on the image captioning task. A transformer-based vision-language pre-trained model is fine-tuned using the artwork image dataset. The quality of the generated captions and the model's capacity to generalize to new data is explored by employing the model on a new collection of paintings and performing an analysis of the relation between commonly generated captions and the artistic genre.
arXiv Detail & Related papers (2021-02-07T23:11:33Z)
Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match. The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed Scenes [54.836331922449666]
We propose a Semantic Guidance and Evaluation Network (SGE-Net) to update the structural priors and the inpainted image. It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated. Experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-15T17:49:20Z)
I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively [135.7695909882746]
We name the MAximum Discrepancy (MAD) competition. We adaptively sample a small test set from an arbitrarily large corpus of unlabeled images. Human labeling on the resulting model-dependent image sets reveals the relative performance of the competing classifiers.
arXiv Detail & Related papers (2020-02-25T03:32:29Z)
Learning Transformation-Aware Embeddings for Image Forensics [15.484408315588569]
Image Provenance Analysis aims at discovering relationships among different manipulated image versions that share content. One of the main sub-problems for provenance analysis that has not yet been addressed directly is the edit ordering of images that share full content or are near-duplicates. This paper introduces a novel deep learning-based approach to provide a plausible ordering to images that have been generated from a single image through transformations.
arXiv Detail & Related papers (2020-01-13T22:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.