Related papers: In-Bed Human Pose Estimation from Unseen and Privacy-Preserving Image Domains

In-Bed Human Pose Estimation from Unseen and Privacy-Preserving Image Domains

URL: http://arxiv.org/abs/2111.15124v1
Date: Tue, 30 Nov 2021 04:56:16 GMT
Title: In-Bed Human Pose Estimation from Unseen and Privacy-Preserving Image Domains
Authors: Ting Cao, Mohammad Ali Armin, Simon Denman, Lars Petersson, David Ahmedt-Aristizabal
Abstract summary: In-bed human posture estimation provides important health-related metrics with potential value in medical condition assessments. We propose a multi-modal conditional variational autoencoder (MC-VAE) capable of reconstructing features from missing modalities seen during training. We demonstrate that body positions can be effectively recognized from the available modality, achieving on par results with baseline models.
Score: 22.92165116962952
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Medical applications have benefited from the rapid advancement in computer vision. For patient monitoring in particular, in-bed human posture estimation provides important health-related metrics with potential value in medical condition assessments. Despite great progress in this domain, it remains a challenging task due to substantial ambiguity during occlusions, and the lack of large corpora of manually labeled data for model training, particularly with domains such as thermal infrared imaging which are privacy-preserving, and thus of great interest. Motivated by the effectiveness of self-supervised methods in learning features directly from data, we propose a multi-modal conditional variational autoencoder (MC-VAE) capable of reconstructing features from missing modalities seen during training. This approach is used with HRNet to enable single modality inference for in-bed pose estimation. Through extensive evaluations, we demonstrate that body positions can be effectively recognized from the available modality, achieving on par results with baseline models that are highly dependent on having access to multiple modes at inference time. The proposed framework supports future research towards self-supervised learning that generates a robust model from a single source, and expects it to generalize over many unknown distributions in clinical environments.

Related papers

Improving Out-of-distribution Human Activity Recognition via IMU-Video Cross-modal Representation Learning [3.177649348456073]
Human Activity Recognition (HAR) based on wearable inertial sensors plays a critical role in remote health monitoring.<n>We propose a new cross-modal self-supervised pretraining approach to learn representations from large-sale unlabeled IMU-video data.<n>Our results indicate that the proposed cross-modal pretraining approach outperforms the current state-of-the-art IMU-video pretraining approach.
arXiv Detail & Related papers (2025-07-17T18:47:46Z)
GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning [50.94508930739623]
Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images.<n>Current methods still suffer from limited answer reliability and poor interpretability, impairing the ability of clinicians and patients to understand and trust model-generated answers.<n>This work first proposes a Thinking with Visual Grounding dataset wherein the answer generation is decomposed into intermediate reasoning steps.<n>We introduce a novel verifiable reward mechanism for reinforcement learning to guide post-training, improving the alignment between the model's reasoning process and its final answer.
arXiv Detail & Related papers (2025-06-22T08:09:58Z)
A Vector-Quantized Foundation Model for Patient Behavior Monitoring [43.02353546717171]
This paper introduces a novel foundation model based on a modified vector quantized variational autoencoder, specifically designed to process real-world data from smartphones and wearable devices.<n>We leveraged the discrete latent representation of this model to effectively perform two downstream tasks, suicide risk assessment and emotional state prediction, on different held-out clinical cohorts without the need of fine-tuning.
arXiv Detail & Related papers (2025-03-19T14:01:16Z)
Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance. Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z)
Enhancing Apparent Personality Trait Analysis with Cross-Modal Embeddings [0.5461938536945723]
We present a multimodal deep neural network with a Siamese extension for apparent personality trait prediction trained on short video recordings. Due to the highly centralized target distribution of the analyzed dataset, the changes in the third digit are relevant.
arXiv Detail & Related papers (2024-05-06T20:51:28Z)
Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues. We propose a novel approach to address this issue at test time without requiring retraining. MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z)
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation [47.95611203419802]
Foundations for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. We compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset. We further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution data.
arXiv Detail & Related papers (2023-11-18T14:52:10Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Safe AI for health and beyond -- Monitoring to transform a health service [51.8524501805308]
We will assess the infrastructure required to monitor the outputs of a machine learning algorithm. We will present two scenarios with examples of monitoring and updates of models.
arXiv Detail & Related papers (2023-03-02T17:27:45Z)
On the Robustness of Pretraining and Self-Supervision for a Deep Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading. We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z)
Self-Supervised Graph Learning with Hyperbolic Embedding for Temporal Health Event Prediction [13.24834156675212]
We propose a hyperbolic embedding method with information flow to pre-train medical code representations in a hierarchical structure. We incorporate these pre-trained representations into a graph neural network to detect disease complications. We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data.
arXiv Detail & Related papers (2021-06-09T00:42:44Z)
Concept-based model explanations for Electronic Health Records [1.6837409766909865]
Testing with Concept Activation Vectors (TCAV) has recently been introduced as a way of providing human-understandable explanations. We propose an extension of the method to time series data to enable an application of TCAV to sequential predictions in the EHR.
arXiv Detail & Related papers (2020-12-03T22:18:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.