Related papers: Acoustic-based 3D Human Pose Estimation Robust to Human Position

Acoustic-based 3D Human Pose Estimation Robust to Human Position

URL: http://arxiv.org/abs/2411.07165v1
Date: Fri, 08 Nov 2024 15:56:12 GMT
Title: Acoustic-based 3D Human Pose Estimation Robust to Human Position
Authors: Yusuke Oumi, Yuto Shibata, Go Irie, Akisato Kimura, Yoshimitsu Aoki, Mariko Isogawa,
Abstract summary: The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along a line between loudspeakers and a microphone. Because reflection and diffraction of sound by the human body cause subtle acoustic signal changes compared to sound obstruction, the existing model degrades its accuracy significantly when subjects deviate from this line. To overcome this limitation, we propose a novel method composed of a position discriminator and reverberation-resistant model.
Score: 16.0759003139539
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores the problem of 3D human pose estimation from only low-level acoustic signals. The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along a line between loudspeakers and a microphone. Because reflection and diffraction of sound by the human body cause subtle acoustic signal changes compared to sound obstruction, the existing model degrades its accuracy significantly when subjects deviate from this line, limiting its practicality in real-world scenarios. To overcome this limitation, we propose a novel method composed of a position discriminator and reverberation-resistant model. The former predicts the standing positions of subjects and applies adversarial learning to extract subject position-invariant features. The latter utilizes acoustic signals before the estimation target time as references to enhance robustness against the variations in sound arrival times due to diffraction and reflection. We construct an acoustic pose estimation dataset that covers diverse human locations and demonstrate through experiments that our proposed method outperforms existing approaches.

Related papers

Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization [4.720552406377147]
We propose a technique that aligns adversarial perturbations with low-level acoustic characteristics derived from speech representation models. Our method is plug-and-play and can be integrated with any existing attack methods.
arXiv Detail & Related papers (2025-03-25T12:14:10Z)
Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models [21.669363620480333]
We present an unsupervised method for blind dereverberation and room impulse response estimation, called BUDDy. In a blind scenario where the room impulse response is unknown, BUDDy successfully performs speech dereverberation. Unlike supervised methods, which often struggle to generalize, BUDDy seamlessly adapts to different acoustic conditions.
arXiv Detail & Related papers (2024-08-14T11:31:32Z)
3D Human Pose Analysis via Diffusion Synthesis [65.268245109828]
PADS represents the first diffusion-based framework for tackling general 3D human pose analysis within the inverse problem framework. Its performance has been validated on different benchmarks, signaling the adaptability and robustness of this pipeline.
arXiv Detail & Related papers (2024-01-17T02:59:34Z)
An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples [2.2866551516539726]
A viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. We present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step.
arXiv Detail & Related papers (2023-10-05T06:59:09Z)
Bayesian inference and neural estimation of acoustic wave propagation [10.980762871305279]
We introduce a novel framework which combines physics and machine learning methods to analyse acoustic signals. Three methods are developed for this task: a Bayesian inference approach for inferring the spectral acoustics characteristics, a neural-physical model which equips a neural network with forward and backward physical losses, and the non-linear least squares approach which serves as benchmark. The simplicity and efficiency of this framework is empirically validated on simulated data.
arXiv Detail & Related papers (2023-05-28T15:14:46Z)
Ada3Diff: Defending against 3D Adversarial Point Clouds via Adaptive Diffusion [70.60038549155485]
Deep 3D point cloud models are sensitive to adversarial attacks, which poses threats to safety-critical applications such as autonomous driving. This paper introduces a novel distortion-aware defense framework that can rebuild the pristine data distribution with a tailored intensity estimator and a diffusion model.
arXiv Detail & Related papers (2022-11-29T14:32:43Z)
Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision. We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target. We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
Bayesian Learning for Deep Neural Network Adaptation [57.70991105736059]
A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness. This paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty.
arXiv Detail & Related papers (2020-12-14T12:30:41Z)
Informed Source Extraction With Application to Acoustic Echo Reduction [8.296684637620553]
deep learning methods leverage a speaker discriminative model that maps a reference snippet uttered by the target speaker into a single embedding vector. We propose a time-varying source discriminative model that captures the temporal dynamics of the reference signal. Experimental results demonstrate that the proposed method significantly improves the extraction performance when applied in an acoustic echo reduction scenario.
arXiv Detail & Related papers (2020-11-09T17:13:23Z)
Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.