Counting Fish with Temporal Representations of Sonar Video
- URL: http://arxiv.org/abs/2502.05129v1
- Date: Fri, 07 Feb 2025 18:02:28 GMT
- Title: Counting Fish with Temporal Representations of Sonar Video
- Authors: Kai Van Brunt, Justin Kay, Timm Haucke, Pietro Perona, Grant Van Horn, Sara Beery,
- Abstract summary: We propose an alternative lightweight computer vision method for fish counting based on analyzing echograms.
We achieve a count error of 23% on representative data from the Kenai River in Alaska, demonstrating the feasibility of our approach.
- Score: 15.713015426791221
- License:
- Abstract: Accurate estimates of salmon escapement - the number of fish migrating upstream to spawn - are key data for conservation and fishery management. Existing methods for salmon counting using high-resolution imaging sonar hardware are non-invasive and compatible with computer vision processing. Prior work in this area has utilized object detection and tracking based methods for automated salmon counting. However, these techniques remain inaccessible to many sonar deployment sites due to limited compute and connectivity in the field. We propose an alternative lightweight computer vision method for fish counting based on analyzing echograms - temporal representations that compress several hundred frames of imaging sonar video into a single image. We predict upstream and downstream counts within 200-frame time windows directly from echograms using a ResNet-18 model, and propose a set of domain-specific image augmentations and a weakly-supervised training protocol to further improve results. We achieve a count error of 23% on representative data from the Kenai River in Alaska, demonstrating the feasibility of our approach.
Related papers
- A framework for river connectivity classification using temporal image processing and attention based neural networks [0.0]
Extreme weather events associated with climate change can result in alterations to river and stream connectivity.
Traditional stream flow gauges are costly to deploy and limited to large river bodies.
trail camera methods are a low-cost and easily deployed alternative to collect hourly data.
arXiv Detail & Related papers (2025-02-01T16:00:28Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - TempNet: Temporal Attention Towards the Detection of Animal Behaviour in
Videos [63.85815474157357]
We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos.
TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder.
We demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events.
arXiv Detail & Related papers (2022-11-17T23:55:12Z) - Self-Supervised Multi-Frame Monocular Scene Flow [61.588808225321735]
We introduce a multi-frame monocular scene flow network based on self-supervised learning.
We observe state-of-the-art accuracy among monocular scene flow methods based on self-supervised learning.
arXiv Detail & Related papers (2021-05-05T17:49:55Z) - Deep learning with self-supervision and uncertainty regularization to
count fish in underwater images [28.261323753321328]
Effective conservation actions require effective population monitoring.
Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive.
Counting animals from such data is challenging, particularly when densely packed in noisy images.
Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals.
arXiv Detail & Related papers (2021-04-30T13:02:19Z) - FisheyeSuperPoint: Keypoint Detection and Description Network for
Fisheye Images [2.187613144178315]
Keypoint detection and description is a commonly used building block in computer vision systems.
SuperPoint is a self-supervised keypoint detector and descriptor that has achieved state-of-the-art results on homography estimation.
We introduce a fisheye adaptation pipeline to enable training on undistorted fisheye images.
arXiv Detail & Related papers (2021-02-27T11:26:34Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Single Image Super-Resolution for Domain-Specific Ultra-Low Bandwidth
Image Transmission [1.5469452301122177]
Low-bandwidth communication, such as underwater acoustic communication, is limited by best-case data rates of 30--50 kbit/s.
This is investigated on a large, diverse dataset obtained during years of trawl fishing where cameras have been placed in the fishing nets.
A neural network is then trained to perform up-sampling, trying to reconstruct the original image.
arXiv Detail & Related papers (2020-09-09T06:44:30Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z) - Temperate Fish Detection and Classification: a Deep Learning based
Approach [6.282069822653608]
We propose a two-step deep learning approach for the detection and classification of temperate fishes without pre-filtering.
The first step is to detect each single fish in an image, independent of species and sex.
In the second step, we adopt a Convolutional Neural Network (CNN) with the Squeeze-and-Excitation (SE) architecture for classifying each fish in the image without pre-filtering.
arXiv Detail & Related papers (2020-05-14T12:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.