Invisible-to-Visible: Privacy-Aware Human Instance Segmentation using
Airborne Ultrasound via Collaborative Learning Variational Autoencoder
- URL: http://arxiv.org/abs/2204.07280v1
- Date: Fri, 15 Apr 2022 00:56:01 GMT
- Title: Invisible-to-Visible: Privacy-Aware Human Instance Segmentation using
Airborne Ultrasound via Collaborative Learning Variational Autoencoder
- Authors: Risako Tanigawa, Yasunori Ishii, Kazuki Kozuka and Takayoshi Yamashita
- Abstract summary: We propose a new task for human instance segmentation from invisible information, especially airborne ultrasound, for action recognition.
To perform instance segmentation from invisible information, we first convert sound waves to reflected sound directional images (sound images)
In inference, it is possible to obtain instance segmentation results only from sound images.
- Score: 8.21448246263952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In action understanding in indoor, we have to recognize human pose and action
considering privacy. Although camera images can be used for highly accurate
human action recognition, camera images do not preserve privacy. Therefore, we
propose a new task for human instance segmentation from invisible information,
especially airborne ultrasound, for action recognition. To perform instance
segmentation from invisible information, we first convert sound waves to
reflected sound directional images (sound images). Although the sound images
can roughly identify the location of a person, the detailed shape is ambiguous.
To address this problem, we propose a collaborative learning variational
autoencoder (CL-VAE) that simultaneously uses sound and RGB images during
training. In inference, it is possible to obtain instance segmentation results
only from sound images. As a result of performance verification, CL-VAE could
estimate human instance segmentations more accurately than conventional
variational autoencoder and some other models. Since this method can obtain
human segmentations individually, it could be applied to human action
recognition tasks with privacy protection.
Related papers
- UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data.
We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images.
We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z) - hear-your-action: human action recognition by ultrasound active sensing [3.0277213703725767]
Action recognition is a key technology for many industrial applications.
Privacy issues prevent widespread usage due to the inclusion of private information.
We propose a privacy-preserving action recognition by ultrasound active sensing.
arXiv Detail & Related papers (2023-09-15T01:00:55Z) - No-audio speaking status detection in crowded settings via visual
pose-based filtering and wearable acceleration [8.710774926703321]
Video and wearable sensors make it possible recognize speaking in an unobtrusive, privacy-preserving way.
We show that the selection of local features around pose keypoints has a positive effect on generalization performance.
We additionally make use of acceleration measured through wearable sensors for the same task, and present a multimodal approach combining both methods.
arXiv Detail & Related papers (2022-11-01T15:55:48Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - Invisible-to-Visible: Privacy-Aware Human Segmentation using Airborne
Ultrasound via Collaborative Learning Probabilistic U-Net [8.21448246263952]
We propose a new task for human segmentation from invisible information, especially airborne ultrasound.
Although ultrasound images can roughly identify a person's location, the detailed shape is ambiguous.
We propose a collaborative learning probabilistic U-Net that uses ultrasound and segmentation images simultaneously during training.
arXiv Detail & Related papers (2022-05-11T06:42:24Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss.
We experimentally evaluate our approach on queries over private databases.
We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - A proto-object based audiovisual saliency map [0.0]
We develop a proto-object based audiovisual saliency map (AVSM) for analysis of dynamic natural scenes.
Such environment can be useful in surveillance, robotic navigation, video compression and related applications.
arXiv Detail & Related papers (2020-03-15T08:34:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.