Automatic evaluation of herding behavior in towed fishing gear using
end-to-end training of CNN and attention-based networks
- URL: http://arxiv.org/abs/2303.12016v1
- Date: Tue, 21 Mar 2023 16:52:08 GMT
- Title: Automatic evaluation of herding behavior in towed fishing gear using
end-to-end training of CNN and attention-based networks
- Authors: Orri Steinn Gu{\dh}finnsson, T\'yr Vilhj\'almsson, Martin Eineborg and
Torfi Thorhallsson
- Abstract summary: The paper compares three convolutional and attention-based deep action recognition network architectures trained end-to-end.
A two-stream CNN model, a CNN-transformer hybrid, and a pure transformer model were trained end-to-end to achieve 63%, 54%, and 60% 10-fold classification accuracy.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers the automatic classification of herding behavior in the
cluttered low-visibility environment that typically surrounds towed fishing
gear. The paper compares three convolutional and attention-based deep action
recognition network architectures trained end-to-end on a small set of video
sequences captured by a remotely controlled camera and classified by an expert
in fishing technology. The sequences depict a scene in front of a fishing trawl
where the conventional herding mechanism has been replaced by directed laser
light. The goal is to detect the presence of a fish in the sequence and
classify whether or not the fish reacts to the lasers. A two-stream CNN model,
a CNN-transformer hybrid, and a pure transformer model were trained end-to-end
to achieve 63%, 54%, and 60% 10-fold classification accuracy on the three-class
task when compared to the human expert. Inspection of the activation maps
learned by the three networks raises questions about the attributes of the
sequences the models may be learning, specifically whether changes in viewpoint
introduced by human camera operators that affect the position of laser lines in
the video frames may interfere with the classification. This underlines the
importance of careful experimental design when capturing scientific data for
automatic end-to-end evaluation and the usefulness of inspecting the trained
models.
Related papers
- SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Unsupervised Fish Trajectory Tracking and Segmentation [2.1028463367241033]
We propose a three-stage framework for robust fish tracking and segmentation.
The first stage is an optical flow model, which generates the pseudo labels using spatial and temporal consistency between frames.
In the second stage, a self-supervised model refines the pseudo-labels incrementally.
In the third stage, the refined labels are used to train a segmentation network.
arXiv Detail & Related papers (2022-08-23T01:01:27Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - HCIL: Hierarchical Class Incremental Learning for Longline Fishing
Visual Monitoring [30.084499552709183]
We introduce a Hierarchical Class Incremental Learning (HCIL) model, which significantly improves the state-of-the-art hierarchical classification methods under the CIL scenario.
A CIL system should be able to learn about more and more classes over time from a stream of data, i.e., only the training data for a small number of classes have to be present at the beginning and new classes can be added progressively.
arXiv Detail & Related papers (2022-02-25T23:53:11Z) - Video-based Hierarchical Species Classification for Longline Fishing
Monitoring [17.031967273526803]
Hierarchical classification based on videos allows for inexpensive and efficient fish species identification of catches from longline fishing.
With a known non-overlapping hierarchical data structure provided by fisheries scientists, our method enforces the hierarchical data structure.
Our experiments show that the proposed method outperforms the classic flat classification system significantly.
arXiv Detail & Related papers (2021-02-06T06:10:52Z) - Movement Tracks for the Automatic Detection of Fish Behavior in Videos [63.85815474157357]
We offer a dataset of sablefish (Anoplopoma fimbria) startle behaviors in underwater videos, and investigate the use of deep learning (DL) methods for behavior detection on it.
Our proposed detection system identifies fish instances using DL-based frameworks, determines trajectory tracks, derives novel behavior-specific features, and employs Long Short-Term Memory (LSTM) networks to identify startle behavior in sablefish.
arXiv Detail & Related papers (2020-11-28T05:51:19Z) - Auto-Rectify Network for Unsupervised Indoor Depth Estimation [119.82412041164372]
We establish that the complex ego-motions exhibited in handheld settings are a critical obstacle for learning depth.
We propose a data pre-processing method that rectifies training images by removing their relative rotations for effective learning.
Our results outperform the previous unsupervised SOTA method by a large margin on the challenging NYUv2 dataset.
arXiv Detail & Related papers (2020-06-04T08:59:17Z) - Temperate Fish Detection and Classification: a Deep Learning based
Approach [6.282069822653608]
We propose a two-step deep learning approach for the detection and classification of temperate fishes without pre-filtering.
The first step is to detect each single fish in an image, independent of species and sex.
In the second step, we adopt a Convolutional Neural Network (CNN) with the Squeeze-and-Excitation (SE) architecture for classifying each fish in the image without pre-filtering.
arXiv Detail & Related papers (2020-05-14T12:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.