Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning
- URL: http://arxiv.org/abs/2507.23479v1
- Date: Thu, 31 Jul 2025 12:00:25 GMT
- Title: Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning
- Authors: Julia Werner, Oliver Bause, Julius Oexle, Maxime Le Floch, Franz Brinkmann, Jochen Hampe, Oliver Bringmann,
- Abstract summary: We introduce a multi-task neural network that combines the functionalities of precise self-localization within the gastrointestinal tract with the ability to detect anomalies in the small intestine within a single model.<n>Our model achieves an accu- racy of 93.63% on the localization task and an accuracy of 87.48% on the anomaly detection task.
- Score: 0.8824955686704116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video capsule endoscopy has become increasingly important for investigating the small intestine within the gastrointestinal tract. However, a persistent challenge remains the short battery lifetime of such compact sensor edge devices. Integrating artificial intelligence can help overcome this limitation by enabling intelligent real-time decision- making, thereby reducing the energy consumption and prolonging the battery life. However, this remains challenging due to data sparsity and the limited resources of the device restricting the overall model size. In this work, we introduce a multi-task neural network that combines the functionalities of precise self-localization within the gastrointestinal tract with the ability to detect anomalies in the small intestine within a single model. Throughout the development process, we consistently restricted the total number of parameters to ensure the feasibility to deploy such model in a small capsule. We report the first multi-task results using the recently published Galar dataset, integrating established multi-task methods and Viterbi decoding for subsequent time-series analysis. This outperforms current single-task models and represents a significant ad- vance in AI-based approaches in this field. Our model achieves an accu- racy of 93.63% on the localization task and an accuracy of 87.48% on the anomaly detection task. The approach requires only 1 million parameters while surpassing the current baselines.
Related papers
- Enhanced Anomaly Detection for Capsule Endoscopy Using Ensemble Learning Strategies [0.8824955686704116]
This work introduces an ensemble strategy to address the challenge in anomaly detection tasks in video capsule endoscopies.<n>We propose using various loss functions, drawn from the anomaly detection field, to train each network.<n>We achieve an AUC score of 76.86% on the Kvasir-Capsule and an AUC score of 76.98% on the Galar dataset.
arXiv Detail & Related papers (2025-04-08T13:39:39Z) - Optimizing Resource Consumption in Diffusion Models through Hallucination Early Detection [87.22082662250999]
We introduce HEaD (Hallucination Early Detection), a new paradigm designed to swiftly detect incorrect generations at the beginning of the diffusion process.
We demonstrate that using HEaD saves computational resources and accelerates the generation process to get a complete image.
Our findings reveal that HEaD can save up to 12% of the generation time on a two objects scenario.
arXiv Detail & Related papers (2024-09-16T18:00:00Z) - SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery [7.863539113283565]
We propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection.<n> SEDMamba enhances selective SSM with a bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos.<n>Our work also contributes the first-of-its-kind, frame-level, in-vivo surgical error dataset to support error detection in real surgical cases.
arXiv Detail & Related papers (2024-06-22T19:20:35Z) - An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification [2.2940141855172036]
In molecular biology, there has been an explosion of data generated from multi-omics sequencing.
Traditional statistical methods face challenging tasks when dealing with such high dimensional data.
This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features.
arXiv Detail & Related papers (2024-05-16T01:45:55Z) - GENOT: Entropic (Gromov) Wasserstein Flow Matching with Applications to Single-Cell Genomics [20.01834405021846]
Single-cell genomics has advanced our understanding of cellular behavior, catalyzing innovations in treatments and precision medicine.
Traditional discrete solvers are hampered by scalability, privacy, and out-of-sample estimation issues.
We present a neural network-based solvers, known as neural OT solvers, that parameterize OT maps.
We demonstrate its versatility and robustness through applications in cell development studies, cellular drug response modeling, and cross-modality cell translation.
arXiv Detail & Related papers (2023-10-13T17:12:04Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell
Detection and Counting [14.222014969736993]
Multi-class cell detection and counting is an essential task for many pathological diagnoses.
We propose guided posterior regularization (DeGPR) which assists an object detector by guiding it to exploit discriminative features among cells.
We validate our model on two publicly available datasets, and on MuCeD, a novel dataset that we contribute.
arXiv Detail & Related papers (2023-04-03T06:25:45Z) - Fast Exploration of the Impact of Precision Reduction on Spiking Neural
Networks [63.614519238823206]
Spiking Neural Networks (SNNs) are a practical choice when the target hardware reaches the edge of computing.
We employ an Interval Arithmetic (IA) model to develop an exploration methodology that takes advantage of the capability of such a model to propagate the approximation error.
arXiv Detail & Related papers (2022-11-22T15:08:05Z) - Deep neural networks approach to microbial colony detection -- a
comparative analysis [52.77024349608834]
This study investigates the performance of three deep learning approaches for object detection on the AGAR dataset.
The achieved results may serve as a benchmark for future experiments.
arXiv Detail & Related papers (2021-08-23T12:06:00Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.