TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos
- URL: http://arxiv.org/abs/2211.09950v1
- Date: Thu, 17 Nov 2022 23:55:12 GMT
- Title: TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos
- Authors: Declan McIntosh and Tunai Porto Marques and Alexandra Branzan Albu and
Rodney Rountree and Fabio De Leo
- Abstract summary: We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
- Score: 63.85815474157357
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent advancements in cabled ocean observatories have increased the quality
and prevalence of underwater videos; this data enables the extraction of
high-level biologically relevant information such as species' behaviours.
Despite this increase in capability, most modern methods for the automatic
interpretation of underwater videos focus only on the detection and counting
organisms. We propose an efficient computer vision- and deep learning-based
method for the detection of biological behaviours in videos. TempNet uses an
encoder bridge and residual blocks to maintain model performance with a
two-staged, spatial, then temporal, encoder. TempNet also presents temporal
attention during spatial encoding as well as Wavelet Down-Sampling
pre-processing to improve model accuracy. Although our system is designed for
applications to diverse fish behaviours (i.e, is generic), we demonstrate its
application to the detection of sablefish (Anoplopoma fimbria) startle events.
We compare the proposed approach with a state-of-the-art end-to-end video
detection method (ReMotENet) and a hybrid method previously offered exclusively
for the detection of sablefish's startle events in videos from an existing
dataset. Results show that our novel method comfortably outperforms the
comparison baselines in multiple metrics, reaching a per-clip accuracy and
precision of 80% and 0.81, respectively. This represents a relative improvement
of 31% in accuracy and 27% in precision over the compared methods using this
dataset. Our computational pipeline is also highly efficient, as it can process
each 4-second video clip in only 38ms. Furthermore, since it does not employ
features specific to sablefish startle events, our system can be easily
extended to other behaviours in future works.
Related papers
- Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - Graspness Discovery in Clutters for Fast and Accurate Grasp Detection [57.81325062171676]
"graspness" is a quality based on geometry cues that distinguishes graspable areas in cluttered scenes.
We develop a neural network named cascaded graspness model to approximate the searching process.
Experiments on a large-scale benchmark, GraspNet-1Billion, show that our method outperforms previous arts by a large margin.
arXiv Detail & Related papers (2024-06-17T02:06:47Z) - It Takes Two: Masked Appearance-Motion Modeling for Self-supervised
Video Transformer Pre-training [76.69480467101143]
Self-supervised video transformer pre-training has recently benefited from the mask-and-predict pipeline.
We explicitly investigate motion cues in videos as extra prediction target and propose our Masked Appearance-Motion Modeling framework.
Our method learns generalized video representations and achieves 82.3% on Kinects-400, 71.3% on Something-Something V2, 91.5% on UCF101, and 62.5% on HMDB51.
arXiv Detail & Related papers (2022-10-11T08:05:18Z) - Real-world Video Anomaly Detection by Extracting Salient Features in
Videos [0.0]
Existing methods used multiple-instance learning (MIL) to determine the normal/abnormal status of each segment of the video.
We propose a lightweight model with a self-attention mechanism to automatically extract features that are important for determining normal/abnormal from all input segments.
Our method can achieve the comparable or better accuracy than state-of-the-art methods.
arXiv Detail & Related papers (2022-09-14T06:03:09Z) - ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources.
We build a unified framework for efficient end-to-end temporal action detection (ETAD)
ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z) - A deep neural network for multi-species fish detection using multiple
acoustic cameras [0.0]
We present a novel approach that takes advantage of both CNN (Convolutional Neural Network) and classical CV (Computer Vision) techniques.
The pipeline pre-treats the acoustic images to extract 2 features, in order to localise the signals and improve the detection performances.
The YOLOv3-based model was trained with data of fish from multiple species recorded by the two common acoustic cameras.
arXiv Detail & Related papers (2021-09-22T11:47:24Z) - AdaFuse: Adaptive Temporal Fusion Network for Efficient Action
Recognition [68.70214388982545]
Temporal modelling is the key for efficient video action recognition.
We introduce an adaptive temporal fusion network, called AdaFuse, that fuses channels from current and past feature maps.
Our approach can achieve about 40% computation savings with comparable accuracy to state-of-the-art methods.
arXiv Detail & Related papers (2021-02-10T23:31:02Z) - A Plug-and-play Scheme to Adapt Image Saliency Deep Model for Video Data [54.198279280967185]
This paper proposes a novel plug-and-play scheme to weakly retrain a pretrained image saliency deep model for video data.
Our method is simple yet effective for adapting any off-the-shelf pre-trained image saliency deep model to obtain high-quality video saliency detection.
arXiv Detail & Related papers (2020-08-02T13:23:14Z) - Temperate Fish Detection and Classification: a Deep Learning based
Approach [6.282069822653608]
We propose a two-step deep learning approach for the detection and classification of temperate fishes without pre-filtering.
The first step is to detect each single fish in an image, independent of species and sex.
In the second step, we adopt a Convolutional Neural Network (CNN) with the Squeeze-and-Excitation (SE) architecture for classifying each fish in the image without pre-filtering.
arXiv Detail & Related papers (2020-05-14T12:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.