Distributed bundle adjustment with block-based sparse matrix compression
for super large scale datasets
- URL: http://arxiv.org/abs/2307.08383v2
- Date: Sun, 13 Aug 2023 06:47:15 GMT
- Title: Distributed bundle adjustment with block-based sparse matrix compression
for super large scale datasets
- Authors: Maoteng Zheng, Nengcheng Chen, Junfeng Zhu, Xiaoru Zeng, Huanbin Qiu,
Yuyao Jiang, Xingyue Lu, Hao Qu
- Abstract summary: We propose a distributed bundle adjustment (DBA) method using the exact Levenberg-Marquardt (LM) algorithm for super large-scale datasets.
For the first time, we conducted parallel bundle adjustment using LM algorithm on a real datasets with 1.18 million images and a synthetic dataset with 10 million images.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a distributed bundle adjustment (DBA) method using the exact
Levenberg-Marquardt (LM) algorithm for super large-scale datasets. Most of the
existing methods partition the global map to small ones and conduct bundle
adjustment in the submaps. In order to fit the parallel framework, they use
approximate solutions instead of the LM algorithm. However, those methods often
give sub-optimal results. Different from them, we utilize the exact LM
algorithm to conduct global bundle adjustment where the formation of the
reduced camera system (RCS) is actually parallelized and executed in a
distributed way. To store the large RCS, we compress it with a block-based
sparse matrix compression format (BSMC), which fully exploits its block
feature. The BSMC format also enables the distributed storage and updating of
the global RCS. The proposed method is extensively evaluated and compared with
the state-of-the-art pipelines using both synthetic and real datasets.
Preliminary results demonstrate the efficient memory usage and vast scalability
of the proposed method compared with the baselines. For the first time, we
conducted parallel bundle adjustment using LM algorithm on a real datasets with
1.18 million images and a synthetic dataset with 10 million images (about 500
times that of the state-of-the-art LM-based BA) on a distributed computing
system.
Related papers
- Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks.
We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems.
Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z) - SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators [25.229269944770678]
Large Language Models (LLMs) have transformed natural language processing, but face challenges in widespread deployment due to their high runtime cost.
We introduce SeedLM, a novel post-training compression method that uses seeds of pseudo-random generators to encode and compress model weights.
arXiv Detail & Related papers (2024-10-14T16:57:23Z) - ARB-LLM: Alternating Refined Binarizations for Large Language Models [82.24826360906341]
ARB-LLM is a novel 1-bit post-training quantization (PTQ) technique tailored for Large Language Models (LLMs)
As a binary PTQ method, our ARB-LLM$_textRC$ is the first to surpass FP16 models of the same size.
arXiv Detail & Related papers (2024-10-04T03:50:10Z) - OneBit: Towards Extremely Low-bit Large Language Models [66.29839811207617]
This paper boldly quantizes the weight matrices of LLMs to 1-bit, paving the way for the extremely low bit-width deployment of LLMs.
Experiments indicate that OneBit achieves good performance (at least 81% of the non-quantized performance on LLaMA models) with robust training processes.
arXiv Detail & Related papers (2024-02-17T14:26:57Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z) - Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models
in Federated Learning [0.22499166814992444]
This paper proposes a new distributed Markov Chain Monte Carlo (MCMC) inference method for DPMMs (DisCGS) using sufficient statistics.
Our approach uses the collapsed Gibbs sampler and is specifically designed to work on distributed data across independent and heterogeneous machines.
For instance, with a dataset of 100K data points, the centralized algorithm requires approximately 12 hours to complete 100 iterations while our approach achieves the same number of iterations in just 3 minutes.
arXiv Detail & Related papers (2023-12-18T13:16:18Z) - {\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images.
We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context.
We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Stochastic Bundle Adjustment for Efficient and Scalable 3D
Reconstruction [43.736296034673124]
Current bundle adjustment solvers such as the Levenberg-Marquardt (LM) algorithm are limited by the bottleneck in solving the Reduced Camera System (RCS) whose dimension is proportional to the camera number.
We propose a bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM to improve the efficiency and scalability.
arXiv Detail & Related papers (2020-08-02T10:26:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.