Single-stage intake gesture detection using CTC loss and extended prefix
beam search
- URL: http://arxiv.org/abs/2008.02999v2
- Date: Sat, 21 Nov 2020 01:05:45 GMT
- Title: Single-stage intake gesture detection using CTC loss and extended prefix
beam search
- Authors: Philipp V. Rouast and Marc T. P. Adam
- Abstract summary: Accurate detection of individual intake gestures is a key step towards automatic dietary monitoring.
We propose a single-stage approach which directly decodes the probabilities learned from sensor data into sparse intake detections.
- Score: 8.22379888383833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate detection of individual intake gestures is a key step towards
automatic dietary monitoring. Both inertial sensor data of wrist movements and
video data depicting the upper body have been used for this purpose. The most
advanced approaches to date use a two-stage approach, in which (i) frame-level
intake probabilities are learned from the sensor data using a deep neural
network, and then (ii) sparse intake events are detected by finding the maxima
of the frame-level probabilities. In this study, we propose a single-stage
approach which directly decodes the probabilities learned from sensor data into
sparse intake detections. This is achieved by weakly supervised training using
Connectionist Temporal Classification (CTC) loss, and decoding using a novel
extended prefix beam search decoding algorithm. Benefits of this approach
include (i) end-to-end training for detections, (ii) simplified timing
requirements for intake gesture labels, and (iii) improved detection
performance compared to existing approaches. Across two separate datasets, we
achieve relative $F_1$ score improvements between 1.9% and 6.2% over the
two-stage approach for intake detection and eating/drinking detection tasks,
for both video and inertial sensors.
Related papers
- A Low-cost Strategic Monitoring Approach for Scalable and Interpretable
Error Detection in Deep Neural Networks [6.537257913467249]
We present a highly compact run-time monitoring approach for deep computer vision networks.
It can efficiently detect silent data corruption originating from both hardware memory and input faults.
arXiv Detail & Related papers (2023-10-31T10:45:55Z) - DiffusionEngine: Diffusion Model is Scalable Data Engine for Object
Detection [41.436817746749384]
Diffusion Model is a scalable data engine for object detection.
DiffusionEngine (DE) provides high-quality detection-oriented training pairs in a single stage.
arXiv Detail & Related papers (2023-09-07T17:55:01Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - A Novel Two Stream Decision Level Fusion of Vision and Inertial Sensors
Data for Automatic Multimodal Human Activity Recognition System [2.5214116139219787]
This paper presents a novel multimodal human activity recognition system.
It uses a two-stream decision level fusion of vision and inertial sensors.
The accuracies obtained by the proposed system are 96.9 %, 97.6 %, 98.7 %, and 95.9 % respectively.
arXiv Detail & Related papers (2023-06-27T19:29:35Z) - Multi-modal Sensor Data Fusion for In-situ Classification of Animal
Behavior Using Accelerometry and GNSS Data [16.47484520898938]
We examine using data from multiple sensing modes, i.e., accelerometry and global navigation satellite system (GNSS) for classifying animal behavior.
We develop multi-modal animal behavior classification algorithms using two real-world datasets collected via smart cattle collar and ear tags.
arXiv Detail & Related papers (2022-06-24T04:54:03Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - From One to Many: A Deep Learning Coincident Gravitational-Wave Search [58.720142291102135]
We construct a two-detector search for gravitational waves from binary black hole mergers using neural networks trained on non-spinning binary black hole data from a single detector.
We find that none of these simple two-detector networks are capable of improving the sensitivity over applying networks individually to the data from the detectors.
arXiv Detail & Related papers (2021-08-24T13:25:02Z) - ESAD: End-to-end Deep Semi-supervised Anomaly Detection [85.81138474858197]
We propose a new objective function that measures the KL-divergence between normal and anomalous data.
The proposed method significantly outperforms several state-of-the-arts on multiple benchmark datasets.
arXiv Detail & Related papers (2020-12-09T08:16:35Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z) - Sequential Drift Detection in Deep Learning Classifiers [4.022057598291766]
We utilize neural network embeddings to detect data drift by formulating the drift detection within an appropriate sequential decision framework.
We introduce a loss function which evaluates an algorithm's ability to balance these two concerns.
arXiv Detail & Related papers (2020-07-31T14:46:21Z) - EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with
Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system.
It can be trained in one shot on both fully and weakly-annotated data.
It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.