Related papers: Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

URL: http://arxiv.org/abs/2108.11996v1
Date: Thu, 26 Aug 2021 18:52:35 GMT
Title: Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
Authors: Nikita Dvornik and Isma Hadji and Konstantinos G. Derpanis and Animesh Garg and Allan D. Jepson
Abstract summary: We introduce Drop-DTW, a novel algorithm that aligns the common signal between the sequences while automatically dropping the outlier elements from the matching. In our experiments, we show that Drop-DTW is a robust similarity measure for sequence retrieval and demonstrate its effectiveness as a training loss on diverse applications.
Score: 33.174893836302005
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we consider the problem of sequence-to-sequence alignment for signals containing outliers. Assuming the absence of outliers, the standard Dynamic Time Warping (DTW) algorithm efficiently computes the optimal alignment between two (generally) variable-length sequences. While DTW is robust to temporal shifts and dilations of the signal, it fails to align sequences in a meaningful way in the presence of outliers that can be arbitrarily interspersed in the sequences. To address this problem, we introduce Drop-DTW, a novel algorithm that aligns the common signal between the sequences while automatically dropping the outlier elements from the matching. The entire procedure is implemented as a single dynamic program that is efficient and fully differentiable. In our experiments, we show that Drop-DTW is a robust similarity measure for sequence retrieval and demonstrate its effectiveness as a training loss on diverse applications. With Drop-DTW, we address temporal step localization on instructional videos, representation learning from noisy videos, and cross-modal representation learning for audio-visual retrieval and localization. In all applications, we take a weakly- or unsupervised approach and demonstrate state-of-the-art results under these settings.

Related papers

Deep Time Warping for Multiple Time Series Alignment [0.0]
Time Series Alignment is a critical task in signal processing with numerous real-world applications. This paper introduces a novel approach for Multiple Time Series Alignment leveraging Deep Learning techniques.
arXiv Detail & Related papers (2025-02-22T18:55:51Z)
TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation [47.58016750718323]
We propose TimeDART, a novel self-supervised time series pre-training framework. TimeDART unifies two powerful generative paradigms to learn more transferable representations. We conduct extensive experiments on public datasets for time series forecasting and classification.
arXiv Detail & Related papers (2024-10-08T06:08:33Z)
Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment [3.2873782624127834]
We present a self-supervised method for representation learning based on aligning temporal video sequences. We introduce the novel Local-Alignment Contrastive (LAC) loss, which combines a differentiable local alignment loss to capture local temporal dependencies. We show that our learned representations outperform existing state-of-the-art approaches on action recognition tasks.
arXiv Detail & Related papers (2024-09-06T20:32:53Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
TheGlueNote: Learned Representations for Robust and Flexible Note Alignment [3.997809845676912]
We show how a transformer encoder network - TheGlueNote - predicts pairwise note similarities for two 512 note subsequences. Our approach performs on par with the state of the art in terms of note alignment accuracy, is considerably more robust to version mismatches, and works directly on any pair of MIDI files.
arXiv Detail & Related papers (2024-08-08T08:42:30Z)
Deep Declarative Dynamic Time Warping for End-to-End Learning of Alignment Paths [54.53208538517505]
This paper addresses learning end-to-end models for time series data that include a temporal alignment step via dynamic time warping (DTW) We propose a DTW layer based around bi-level optimisation and deep declarative networks, which we name DecDTW. We show that this property is particularly useful for applications where downstream loss functions are defined on the optimal alignment path itself.
arXiv Detail & Related papers (2023-03-19T21:58:37Z)
Approximating DTW with a convolutional neural network on EEG data [9.409281517596396]
We propose a fast and differentiable approximation of Dynamic Time Wrapping (DTW) We show that our methods achieve at least the same level of accuracy as other DTW main approximations with higher computational efficiency.
arXiv Detail & Related papers (2023-01-30T13:27:47Z)
Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization [68.49738668084693]
Self-supervised pre-training recently demonstrates success on large-scale multimodal data. Cross-modality alignment (CMA) is only a weak and noisy supervision. CMA might cause conflicts and biases among modalities.
arXiv Detail & Related papers (2022-11-03T18:12:32Z)
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization [87.47977407022492]
This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in weakly-supervised action localization. Under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting. Our method achieves state-of-the-art performance on two popular benchmarks.
arXiv Detail & Related papers (2022-03-31T05:13:50Z)
Learning to Align Sequential Actions in the Wild [123.62879270881807]
We propose an approach to align sequential actions in the wild that involve diverse temporal variations. Our model accounts for both monotonic and non-monotonic sequences. We demonstrate that our approach consistently outperforms the state-of-the-art in self-supervised sequential action representation learning.
arXiv Detail & Related papers (2021-11-17T18:55:36Z)
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information. Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other. In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z)
Representation Learning via Global Temporal Alignment and Cycle-Consistency [20.715813546383178]
We introduce a weakly supervised method for representation learning based on aligning temporal sequences. We report significant performance increases over previous methods. In addition, we report two applications of our temporal alignment framework, namely 3D pose reconstruction and fine-grained audio/visual retrieval.
arXiv Detail & Related papers (2021-05-11T17:34:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.