Automatic Individual Identification of Patterned Solitary Species Based
on Unlabeled Video Data
- URL: http://arxiv.org/abs/2304.09657v1
- Date: Wed, 19 Apr 2023 13:46:16 GMT
- Title: Automatic Individual Identification of Patterned Solitary Species Based
on Unlabeled Video Data
- Authors: Vanessa Suessle, Mimi Arandjelovic, Ammie K. Kalan, Anthony Agbor,
Christophe Boesch, Gregory Brazzola, Tobias Deschner, Paula Dieguez,
Anne-C\'eline Granjon, Hjalmar Kuehl, Anja Landsmann, Juan Lapuente, Nuria
Maldonado, Amelia Meier, Zuzana Rockaiova, Erin G. Wessling, Roman M. Wittig,
Colleen T. Downs, Andreas Weinmann, Elke Hergenroether
- Abstract summary: We developed a pipeline to analyze videos from camera traps to identify individuals without requiring manual interaction.
This pipeline applies to animal species with uniquely identifiable fur patterns and solitary behavior, such as leopards (Panthera pardus)
The pipeline was tested on a dataset of leopard videos collected by the Pan African Programme: The Cultured Chimpanzee (PanAf)
- Score: 7.667274758235099
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The manual processing and analysis of videos from camera traps is
time-consuming and includes several steps, ranging from the filtering of
falsely triggered footage to identifying and re-identifying individuals. In
this study, we developed a pipeline to automatically analyze videos from camera
traps to identify individuals without requiring manual interaction. This
pipeline applies to animal species with uniquely identifiable fur patterns and
solitary behavior, such as leopards (Panthera pardus). We assumed that the same
individual was seen throughout one triggered video sequence. With this
assumption, multiple images could be assigned to an individual for the initial
database filling without pre-labeling. The pipeline was based on
well-established components from computer vision and deep learning,
particularly convolutional neural networks (CNNs) and scale-invariant feature
transform (SIFT) features. We augmented this basis by implementing additional
components to substitute otherwise required human interactions. Based on the
similarity between frames from the video material, clusters were formed that
represented individuals bypassing the open set problem of the unknown total
population. The pipeline was tested on a dataset of leopard videos collected by
the Pan African Programme: The Cultured Chimpanzee (PanAf) and achieved a
success rate of over 83% for correct matches between previously unknown
individuals. The proposed pipeline can become a valuable tool for future
conservation projects based on camera trap data, reducing the work of manual
analysis for individual identification, when labeled data is unavailable.
Related papers
- BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos [0.8074955699721389]
This study presents a novel dataset from drone videos for baboon detection, tracking, and behavior recognition.
The baboon detection dataset was created by manually annotating all baboons in drone videos with bounding boxes.
The behavior recognition dataset was generated by converting tracks into mini-scenes, a video subregion centered on each animal.
arXiv Detail & Related papers (2024-05-27T23:09:37Z) - CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding.
Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects.
We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z) - ID-Animator: Zero-Shot Identity-Preserving Human Video Generation [16.438935466843304]
ID-Animator is a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training.
Our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models.
arXiv Detail & Related papers (2024-04-23T17:59:43Z) - MBW: Multi-view Bootstrapping in the Wild [30.038254895713276]
Multi-camera systems that train fine-grained detectors have shown promise in detecting such errors.
The approach is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios.
In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates.
We are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches.
arXiv Detail & Related papers (2022-10-04T16:27:54Z) - TubeFormer-DeepLab: Video Mask Transformer [98.47947102154217]
We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner.
TubeFormer-DeepLab directly predicts video tubes with task-specific labels.
arXiv Detail & Related papers (2022-05-30T18:10:33Z) - Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras.
Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement.
In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z) - The iWildCam 2021 Competition Dataset [5.612688040565423]
Ecologists use camera traps to monitor animal populations all over the world.
To estimate the abundance of a species, ecologists need to know not just which species were seen, but how many individuals of each species were seen.
We have prepared a challenge where the training data and test data are from different cameras spread across the globe.
arXiv Detail & Related papers (2021-05-07T20:27:22Z) - Intra-Inter Camera Similarity for Unsupervised Person Re-Identification [50.85048976506701]
We study a novel intra-inter camera similarity for pseudo-label generation.
We train our re-id model in two stages with intra-camera and inter-camera pseudo-labels, respectively.
This simple intra-inter camera similarity produces surprisingly good performance on multiple datasets.
arXiv Detail & Related papers (2021-03-22T08:29:04Z) - Red Carpet to Fight Club: Partially-supervised Domain Transfer for Face
Recognition in Violent Videos [12.534785814117065]
We introduce the WildestFaces dataset to study cross-domain recognition under a variety of adverse conditions.
We establish a rigorous evaluation protocol for this clean-to-violent recognition task, and present a detailed analysis of the proposed dataset and the methods.
arXiv Detail & Related papers (2020-09-16T09:45:33Z) - Learning Person Re-identification Models from Videos with Weak
Supervision [53.53606308822736]
We introduce the problem of learning person re-identification models from videos with weak supervision.
We propose a multiple instance attention learning framework for person re-identification using such video-level labels.
The attention weights are obtained based on all person images instead of person tracklets in a video, making our learned model less affected by noisy annotations.
arXiv Detail & Related papers (2020-07-21T07:23:32Z) - Labelling unlabelled videos from scratch with multi-modal
self-supervision [82.60652426371936]
unsupervised labelling of a video dataset does not come for free from strong feature encoders.
We propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations.
An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels.
arXiv Detail & Related papers (2020-06-24T12:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.