Towards Zero-Shot Camera Trap Image Categorization
- URL: http://arxiv.org/abs/2410.12769v1
- Date: Wed, 16 Oct 2024 17:44:58 GMT
- Title: Towards Zero-Shot Camera Trap Image Categorization
- Authors: Jiří Vyskočil, Lukas Picek,
- Abstract summary: This paper describes the search for an alternative approach to the automatic categorization of camera trap images.
We benchmark state-of-the-art classifiers using a single model for all images.
Next, we evaluate methods combining MegaDetector with one or more classifiers and Segment Anything to assess their impact on reducing location-specific overfitting.
Last, we propose and test two approaches using large language and foundational models, such as DINOv2, BioCLIP, BLIP, and ChatGPT, in a zero-shot scenario.
- Score: 0.0
- License:
- Abstract: This paper describes the search for an alternative approach to the automatic categorization of camera trap images. First, we benchmark state-of-the-art classifiers using a single model for all images. Next, we evaluate methods combining MegaDetector with one or more classifiers and Segment Anything to assess their impact on reducing location-specific overfitting. Last, we propose and test two approaches using large language and foundational models, such as DINOv2, BioCLIP, BLIP, and ChatGPT, in a zero-shot scenario. Evaluation carried out on two publicly available datasets (WCT from New Zealand, CCT20 from the Southwestern US) and a private dataset (CEF from Central Europe) revealed that combining MegaDetector with two separate classifiers achieves the highest accuracy. This approach reduced the relative error of a single BEiTV2 classifier by approximately 42\% on CCT20, 48\% on CEF, and 75\% on WCT. Besides, as the background is removed, the error in terms of accuracy in new locations is reduced to half. The proposed zero-shot pipeline based on DINOv2 and FAISS achieved competitive results (1.0\% and 4.7\% smaller on CCT20, and CEF, respectively), which highlights the potential of zero-shot approaches for camera trap image categorization.
Related papers
- Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection [4.0208298639821525]
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community.
Recent studies show that adapting a pre-trained model or modified loss function can improve performance.
We propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN.
arXiv Detail & Related papers (2023-11-01T04:04:34Z) - Rethinking Image Forgery Detection via Contrastive Learning and
Unsupervised Clustering [26.923409536155166]
We propose FOrensic ContrAstive cLustering (FOCAL) method for image forgery detection.
FOCAL is based on contrastive learning and unsupervised clustering.
Results show FOCAL significantly outperforms state-of-the-art competing algorithms.
arXiv Detail & Related papers (2023-08-18T05:05:30Z) - Enhanced Sharp-GAN For Histopathology Image Synthesis [63.845552349914186]
Histopathology image synthesis aims to address the data shortage issue in training deep learning approaches for accurate cancer detection.
We propose a novel approach that enhances the quality of synthetic images by using nuclei topology and contour regularization.
The proposed approach outperforms Sharp-GAN in all four image quality metrics on two datasets.
arXiv Detail & Related papers (2023-01-24T17:54:01Z) - Adaptation to CT Reconstruction Kernels by Enforcing Cross-domain
Feature Maps Consistency [0.06117371161379209]
We show a decrease in the COVID-19 segmentation quality of the model trained on the smooth and tested on the sharp reconstruction kernels.
We propose the unsupervised adaptation method, called F-Consistency, that outperforms the previous approaches.
arXiv Detail & Related papers (2022-03-28T10:00:03Z) - A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained
Vision-language Model [61.58071099082296]
It is unclear how to make zero-shot recognition working well on broader vision problems, such as object detection and semantic segmentation.
In this paper, we target for zero-shot semantic segmentation, by building it on an off-the-shelf pre-trained vision-language model, i.e., CLIP.
Our experimental results show that this simple framework surpasses previous state-of-the-arts by a large margin.
arXiv Detail & Related papers (2021-12-29T18:56:18Z) - Hierarchical Convolutional Neural Network with Feature Preservation and
Autotuned Thresholding for Crack Detection [5.735035463793008]
Drone imagery is increasingly used in automated inspection for infrastructure surface defects.
This paper proposes a deep learning approach using hierarchical convolutional neural networks with feature preservation.
The proposed technique is then applied to identify surface cracks on the surface of roads, bridges or pavements.
arXiv Detail & Related papers (2021-04-21T13:07:58Z) - DetCo: Unsupervised Contrastive Learning for Object Detection [64.22416613061888]
Unsupervised contrastive learning achieves great success in learning image representations with CNN.
We present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches.
DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule.
arXiv Detail & Related papers (2021-02-09T12:47:20Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z) - Generalized Focal Loss: Learning Qualified and Distributed Bounding
Boxes for Dense Object Detection [85.53263670166304]
One-stage detector basically formulates object detection as dense classification and localization.
Recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization.
This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization.
arXiv Detail & Related papers (2020-06-08T07:24:33Z) - Evading Deepfake-Image Detectors with White- and Black-Box Attacks [75.13740810603686]
We show that a popular forensic approach trains a neural network to distinguish real from synthetic content.
We develop five attack case studies on a state-of-the-art classifier that achieves an area under the ROC curve (AUC) of 0.95 on almost all existing image generators.
We also develop a black-box attack that, with no access to the target classifier, reduces the AUC to 0.22.
arXiv Detail & Related papers (2020-04-01T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.