Scattering Features for Multimodal Gait Recognition
- URL: http://arxiv.org/abs/2001.08830v1
- Date: Thu, 23 Jan 2020 22:11:38 GMT
- Title: Scattering Features for Multimodal Gait Recognition
- Authors: Sr{\dj}an Kiti\'c, Gilles Puy, Patrick P\'erez, Philippe Gilberton
- Abstract summary: We consider the problem of identifying people on the basis of their walk (gait) pattern.
We rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively.
- Score: 5.3526997662068085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of identifying people on the basis of their walk
(gait) pattern. Classical approaches to tackle this problem are based on, e.g.,
video recordings or piezoelectric sensors embedded in the floor. In this work,
we rely on acoustic and vibration measurements, obtained from a microphone and
a geophone sensor, respectively. The contribution of this work is twofold.
First, we propose a feature extraction method based on an (untrained) shallow
scattering network, specially tailored for the gait signals. Second, we
demonstrate that fusing the two modalities improves identification in the
practically relevant open set scenario.
Related papers
- The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise [92.53724347718173]
Diffusion models have achieved remarkable success in text-to-image generation tasks.
We identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images.
arXiv Detail & Related papers (2024-06-04T05:06:00Z) - STMixer: A One-Stage Sparse Action Detector [43.62159663367588]
We propose two core designs for a more flexible one-stage action detector.
First, we sparse a query-based adaptive feature sampling module, which endows the detector with the flexibility of mining a group of features from the entire video-temporal domain.
Second, we devise a decoupled feature mixing module, which dynamically attends to mixes along the spatial and temporal dimensions respectively for better feature decoding.
arXiv Detail & Related papers (2024-04-15T14:52:02Z) - Coherent interaction-free detection of noise [0.0]
We propose interaction-free measurements as a noise-detection technique.
We explore two conceptually different schemes: the coherent and the projective realizations.
We study the signature of noise correlations in the detector's signal.
arXiv Detail & Related papers (2023-12-28T18:24:13Z) - Opening the Black Box of wav2vec Feature Encoder [2.1219431687928525]
We focus on the convolutional feature encoder where its latent space is often speculated to represent discrete acoustic units.
To analyze the embedding space in a reductive manner, we feed the synthesized audio signals, which is the summation of simple sine waves.
We conclude that various information is embedded inside the feature encoder representations: (1) fundamental frequency, (2) formants, and (3) amplitude, packed with (4) sufficient temporal detail.
arXiv Detail & Related papers (2022-10-27T12:47:35Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Metric-based multimodal meta-learning for human movement identification
via footstep recognition [3.300376360949452]
We describe a novel metric-based learning approach that introduces a multimodal framework.
We learn general-purpose representations from low multisensory data obtained from omnipresent sensing systems.
Our results employ a metric-based contrastive learning approach for multi-sensor data to mitigate the impact of data scarcity.
arXiv Detail & Related papers (2021-11-15T18:46:14Z) - WaveFake: A Data Set to Facilitate Audio Deepfake Detection [3.8073142980733]
This paper provides an introduction to signal processing techniques used for analyzing audio signals.
Second, we present a novel data set, for which we collected nine sample sets from five different network architectures, spanning two languages.
Third, we supply practitioners with two baseline models, adopted from the signal processing community, to facilitate further research in this area.
arXiv Detail & Related papers (2021-11-04T12:26:34Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - Data Fusion for Audiovisual Speaker Localization: Extending Dynamic
Stream Weights to the Spatial Domain [103.3388198420822]
Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions.
A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
arXiv Detail & Related papers (2021-02-23T09:59:31Z) - Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating
Back-Propagation for Saliency Detection [54.98042023365694]
We propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples.
The proposed model consists of two sub-models parameterized by neural networks.
arXiv Detail & Related papers (2020-07-23T18:47:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.