An Efficient Mini-batch Method via Partial Transportation
- URL: http://arxiv.org/abs/2108.09645v1
- Date: Sun, 22 Aug 2021 05:45:48 GMT
- Title: An Efficient Mini-batch Method via Partial Transportation
- Authors: Khai Nguyen, Dang Nguyen, Tung Pham, Nhat Ho
- Abstract summary: Mini-batch optimal transport (m-OT) has been widely used to deal with the memory issue of OT in large-scale applications.
We propose a novel mini-batch method by using partial optimal transport (POT) between mini-batch empirical measures.
We show that m-POT is better than m-OT deep domain adaptation applications while having comparable performance with m-UOT.
- Score: 10.127116789814488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mini-batch optimal transport (m-OT) has been widely used recently to deal
with the memory issue of OT in large-scale applications. Despite their
practicality, m-OT suffers from misspecified mappings, namely, mappings that
are optimal on the mini-batch level but do not exist in the optimal
transportation plan between the original measures. To address the misspecified
mappings issue, we propose a novel mini-batch method by using partial optimal
transport (POT) between mini-batch empirical measures, which we refer to as
mini-batch partial optimal transport (m-POT). Leveraging the insight from the
partial transportation, we explain the source of misspecified mappings from the
m-OT and motivate why limiting the amount of transported masses among
mini-batches via POT can alleviate the incorrect mappings. Finally, we carry
out extensive experiments on various applications to compare m-POT with m-OT
and recently proposed mini-batch method, mini-batch unbalanced optimal
transport (m-UOT). We observe that m-POT is better than m-OT deep domain
adaptation applications while having comparable performance with m-UOT. On
other applications, such as deep generative model, gradient flow, and color
transfer, m-POT yields more favorable performance than both m-OT and m-UOT.
Related papers
- Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training [78.93900796545523]
Mini-Sequence Transformer (MsT) is a methodology for highly efficient and accurate LLM training with extremely long sequences.
MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage.
integrated with the huggingface library, MsT successfully extends the maximum context length of Qwen, Mistral, and Gemma-2 by 12-24x.
arXiv Detail & Related papers (2024-07-22T01:52:30Z) - Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences [49.14535254003683]
PaLoRA is a novel parameter-efficient method that augments the original model with task-specific low-rank adapters.
Our experimental results show that PaLoRA outperforms MTL and PFL baselines across various datasets.
arXiv Detail & Related papers (2024-07-10T21:25:51Z) - MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies [85.57899012821211]
Small Language Models (SLMs) are a resource-efficient alternative to Large Language Models (LLMs)
We introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants.
We also introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K.
arXiv Detail & Related papers (2024-04-09T15:36:50Z) - AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning [112.97430455461097]
We propose a general PEFT method that tunes a mixture of adaptation modules introduced in each Transformer layer while keeping most of the PLM weights frozen.
By only tuning 0.1-0.2% of PLM parameters, we show that AdaMix outperforms SOTA parameter-efficient fine-tuning and full model fine-tuning for both NLU and NLG tasks.
arXiv Detail & Related papers (2022-10-31T16:23:36Z) - Budget-Constrained Bounds for Mini-Batch Estimation of Optimal Transport [35.440243358517066]
We introduce novel families of upper and lower bounds for the Optimal Transport problem constructed by aggregating solutions of mini-batch OT problems.
The upper bound family contains traditional mini-batch averaging at one extreme and a tight bound found by optimal coupling of mini-batches at the other.
Through various experiments, we explore the trade-off between computational budget and bound tightness and show the usefulness of these bounds in computer vision applications.
arXiv Detail & Related papers (2022-10-24T22:12:17Z) - Low-rank Optimal Transport: Approximation, Statistics and Debiasing [51.50788603386766]
Low-rank optimal transport (LOT) approach advocated in citescetbon 2021lowrank
LOT is seen as a legitimate contender to entropic regularization when compared on properties of interest.
We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
arXiv Detail & Related papers (2022-05-24T20:51:37Z) - Approximating Optimal Transport via Low-rank and Sparse Factorization [19.808887459724893]
Optimal transport (OT) naturally arises in a wide range of machine learning applications but may often become the computational bottleneck.
A novel approximation for OT is proposed, in which the transport plan can be decomposed into the sum of a low-rank matrix and a sparse one.
arXiv Detail & Related papers (2021-11-12T03:10:45Z) - Learning Space Partitions for Path Planning [54.475949279050596]
PlaLaM outperforms existing path planning methods in 2D navigation tasks, especially in the presence of difficult-to-escape local optima.
These gains transfer to highly multimodal real-world tasks, where we outperform strong baselines in compiler phase ordering by up to 245% and in molecular design by up to 0.4 on properties on a 0-1 scale.
arXiv Detail & Related papers (2021-06-19T18:06:11Z) - Unbalanced minibatch Optimal Transport; applications to Domain
Adaptation [8.889304968879163]
Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions.
We argue that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior.
Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines.
arXiv Detail & Related papers (2021-03-05T11:15:47Z) - BoMb-OT: On Batch of Mini-batches Optimal Transport [23.602237930502948]
Mini-batch optimal transport (m-OT) has been successfully used in practical applications that involve probability measures with intractable density.
We propose a novel mini-batching scheme for optimal transport, named Batch of Mini-batches Optimal Transport (BoMb-OT)
We show that the new mini-batching scheme can estimate a better transportation plan between two original measures than m-OT.
arXiv Detail & Related papers (2021-02-11T09:56:25Z) - Minibatch optimal transport distances; analysis and applications [9.574645423576932]
Optimal transport distances have become a classic tool to compare probability distributions and have found many applications in machine learning.
A common workaround is to compute these distances on minibatches to average the outcome of several smaller optimal transport problems.
We propose in this paper an extended analysis of this practice, which effects were previously studied in restricted cases.
arXiv Detail & Related papers (2021-01-05T21:29:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.