Exact, Parallelizable Dynamic Time Warping Alignment with Linear Memory
- URL: http://arxiv.org/abs/2008.02734v1
- Date: Tue, 4 Aug 2020 15:00:33 GMT
- Title: Exact, Parallelizable Dynamic Time Warping Alignment with Linear Memory
- Authors: Christopher Tralie, Elizabeth Dempsey
- Abstract summary: We present a divide and conquer algorithm that computes the exact globally optimal DTW alignment using O(M+N) memory.
Our algorithm can be parallelized up to a factor of min(M, N) with the same memory constraints, so it can still run more efficiently than the textbook version with an adequate GPU.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio alignment is a fundamental preprocessing step in many MIR pipelines.
For two audio clips with M and N frames, respectively, the most popular
approach, dynamic time warping (DTW), has O(MN) requirements in both memory and
computation, which is prohibitive for frame-level alignments at reasonable
rates. To address this, a variety of memory efficient algorithms exist to
approximate the optimal alignment under the DTW cost. To our knowledge,
however, no exact algorithms exist that are guaranteed to break the quadratic
memory barrier. In this work, we present a divide and conquer algorithm that
computes the exact globally optimal DTW alignment using O(M+N) memory. Its
runtime is still O(MN), trading off memory for a 2x increase in computation.
However, the algorithm can be parallelized up to a factor of min(M, N) with the
same memory constraints, so it can still run more efficiently than the textbook
version with an adequate GPU. We use our algorithm to compute exact alignments
on a collection of orchestral music, which we use as ground truth to benchmark
the alignment accuracy of several popular approximate alignment schemes at
scales that were not previously possible.
Related papers
- Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks.
We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems.
Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z) - Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models.
HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks.
A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z) - Fast offset corrected in-memory training [0.0]
We propose and describe two new and improved algorithms for in-memory computing.
Chopped-TTv2 (c-TTv2) and Analog Gradient Accumulation with Dynamic reference (AGAD) retain the same runtime complexity but correct for any remaining offsets using choppers.
arXiv Detail & Related papers (2023-03-08T17:07:09Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Cumulative Memory Lower Bounds for Randomized and Quantum Computation [1.52292571922932]
Cumulative memory is a measure of time-space complexity.
We prove the first lower bounds on cumulative memory complexity for both sequential classical computation and quantum circuits.
arXiv Detail & Related papers (2023-01-13T17:57:02Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - A Push-Relabel Based Additive Approximation for Optimal Transport [5.111364864495785]
Exact algorithms for computing Optimal Transport can be slow.
We introduce a new and very simple approach to find an $varepsilon$approximation of the OT distance.
Our algorithm achieves a near-optimal execution time of $O(n2/varepsilon2)$ for computing OT distance.
arXiv Detail & Related papers (2022-03-07T21:40:14Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and
Hierarchical Spatial Clustering [6.4805900740861]
We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN$*$.
We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time)
Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13--55.89x, and existing parallel algorithms by at least an order of magnitude.
arXiv Detail & Related papers (2021-04-02T16:05:00Z) - An Efficient Shared-memory Parallel Sinkhorn-Knopp Algorithm to Compute
the Word Mover's Distance [0.18275108630751835]
The Word Mover's Distance (WMD) is a metric that measures the semantic dissimilarity between two text documents.
We present a shared-memory parallel algorithm to compute WMD of one document against many other documents.
arXiv Detail & Related papers (2020-05-14T05:30:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.