Triple-stream Deep Metric Learning of Great Ape Behavioural Actions
- URL: http://arxiv.org/abs/2301.02642v1
- Date: Fri, 6 Jan 2023 18:36:04 GMT
- Title: Triple-stream Deep Metric Learning of Great Ape Behavioural Actions
- Authors: Otto Brookes, Majid Mirmehdi, Hjalmar K\"uhl, Tilo Burghardt
- Abstract summary: We propose the first metric learning system for the recognition of great ape behavioural actions.
Our proposed triple stream embedding architecture works on camera trap videos taken directly in the wild.
- Score: 3.8820728151341717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose the first metric learning system for the recognition of great ape
behavioural actions. Our proposed triple stream embedding architecture works on
camera trap videos taken directly in the wild and demonstrates that the
utilisation of an explicit DensePose-C chimpanzee body part segmentation stream
effectively complements traditional RGB appearance and optical flow streams. We
evaluate system variants with different feature fusion techniques and long-tail
recognition approaches. Results and ablations show performance improvements of
~12% in top-1 accuracy over previous results achieved on the PanAf-500 dataset
containing 180,000 manually annotated frames across nine behavioural actions.
Furthermore, we provide a qualitative analysis of our findings and augment the
metric learning system with long-tail recognition techniques showing that
average per class accuracy -- critical in the domain -- can be improved by ~23%
compared to the literature on that dataset. Finally, since our embedding spaces
are constructed as metric, we provide first data-driven visualisations of the
great ape behavioural action spaces revealing emerging geometry and topology.
We hope that the work sparks further interest in this vital application area of
computer vision for the benefit of endangered great apes.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning [59.68917139718813]
We show that a strong off-the-shelf frozen pretrained visual encoder can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning.
By conditioning on frozen clip-level embeddings from observed steps to predict the actions of unseen steps, our prediction model is able to learn robust representations for forecasting.
arXiv Detail & Related papers (2024-10-04T14:52:09Z) - ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition [5.253376886484742]
We present a vision-language model which employs multi-modal decoding of visual features extracted directly from camera trap videos.
We evaluate our system on the PanAf500 and PanAf20K datasets.
We achieve state-of-the-art performance over vision and vision-language models in top-1 accuracy.
arXiv Detail & Related papers (2024-04-13T09:17:51Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Explored An Effective Methodology for Fine-Grained Snake Recognition [8.908667065576632]
We design a strong multimodal backbone to utilize various meta-information to assist in fine-grained identification.
In order to take full advantage of unlabeled datasets, we use self-supervised learning and supervised learning joint training.
Our method can achieve a macro f1 score 92.7% and 89.4% on private and public dataset, respectively, which is the 1st place among the participators on private leaderboard.
arXiv Detail & Related papers (2022-07-24T02:19:15Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based
Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems.
GREW is the first large-scale dataset for gait recognition in the wild.
SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z) - Dynamic Curriculum Learning for Great Ape Detection in the Wild [14.212559301656]
We propose an end-to-end curriculum learning approach to improve detector construction in real-world jungle environments.
In contrast to previous semi-supervised methods, our approach gradually improves detection quality by steering training towards self-reinforcement.
We show that such virtuous dynamics and controls can avoid learning collapse and gradually tie detector adjustments to higher model quality.
arXiv Detail & Related papers (2022-04-30T14:02:52Z) - UNIK: A Unified Framework for Real-world Skeleton-based Action
Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets.
To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK.
Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.