Related papers: Bundle Adjustment in the Eager Mode

Related papers

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs [11.45717904490388]
Recent advances in transformer-based foundation models have made them the default choice for many tasks.<n>Their rapidly growing size makes fitting a full model on a single GPU increasingly difficult and their computational cost prohibitive.<n>Block low-rank (BLR) compression techniques address this challenge by learning compact representations of weight matrices.
arXiv Detail & Related papers (2025-12-24T00:41:13Z)
StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold [51.93627542334909]
Low-rank adaptation (LoRA) has been widely adopted as a parameter-efficient technique for fine-tuning large-scale pre-trained models.<n>We propose a geometry-aware extension of LoRA that uses a three-factor decomposition $U!SVtop$.
arXiv Detail & Related papers (2025-10-02T11:59:13Z)
FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting [57.97160965244424]
3D Gaussian splatting (3DGS) has enabled various applications in 3D scene representation and novel view synthesis.<n>Previous approaches have focused on pruning less important Gaussians, effectively compressing 3DGS.<n>We present an elastic inference method for 3DGS, achieving substantial rendering performance without additional fine-tuning.
arXiv Detail & Related papers (2025-06-04T17:17:57Z)
Second-order Optimization of Gaussian Splats with Importance Sampling [51.95046424364725]
3D Gaussian Splatting (3DGS) is widely used for novel view rendering due to its high quality and fast inference time. We propose a novel second-order optimization strategy based on Levenberg-Marquardt (LM) and Conjugate Gradient (CG) Our method achieves a $3times$ speedup over standard LM and outperforms Adam by $6times$ when the Gaussian count is low.
arXiv Detail & Related papers (2025-04-17T12:52:08Z)
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing [53.295515505026096]
Janus-Pro-driven Prompt Parsing is a prompt- parsing module that bridges text understanding and layout generation. MIGLoRA is a parameter-efficient plug-in integrating Low-Rank Adaptation into UNet (SD1.5) and DiT (SD3) backbones. The proposed method achieves state-of-the-art performance on COCO and LVIS benchmarks while maintaining parameter efficiency.
arXiv Detail & Related papers (2025-03-27T00:59:14Z)
LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training [0.21756081703275998]
LiteGS is a high-performance and modular framework that enhances both the efficiency and usability of Gaussian splatting. LiteGS achieves a 3.4x speedup over the original 3DGS implementation while reducing memory usage by approximately 30%.
arXiv Detail & Related papers (2025-03-03T05:52:02Z)
SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images [125.66499135980344]
We propose SparseGrasp, a novel open-vocabulary robotic grasping system. SparseGrasp operates efficiently with sparse-view RGB images and handles scene updates fastly. We show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability.
arXiv Detail & Related papers (2024-12-03T03:56:01Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Enabling Efficient On-Device Fine-Tuning of LLMs Using Only Inference Engines [17.539008562641303]
Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. Next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specific data. Fine-tuning on resource-constrained edge devices presents significant challenges due to substantial memory and computational demands.
arXiv Detail & Related papers (2024-09-23T20:14:09Z)
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence [50.417261057533786]
fVDB is a novel framework for deep learning on large-scale 3D data. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines.
arXiv Detail & Related papers (2024-07-01T20:20:33Z)
Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization [71.35604981129838]
Traditional gradient-based bi-level optimization algorithms are ill-suited to meet the demands of large-scale applications. We introduce $(textFG)2textU$, which achieves an unbiased approximation of the meta gradient for bi-level optimization. $(textFG)2textU$ is inherently designed to support parallel computing, enabling it to effectively leverage large-scale distributed computing systems.
arXiv Detail & Related papers (2024-06-20T08:21:52Z)
Pipette: Automatic Fine-grained Large Language Model Training Configurator for Real-World Clusters [5.190794062263327]
Training large language models (LLMs) is known to be challenging because of the huge computational and memory capacity requirements. We propose Pipette, which is an automatic fine-grained LLM training for real-world clusters.
arXiv Detail & Related papers (2024-05-28T11:59:44Z)
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment [56.44025052765861]
Large language models (LLMs) have revolutionized Natural Language Processing (NLP), but their size creates computational bottlenecks. We introduce a novel approach to create accurate, sparse foundational versions of performant LLMs. We show a total speedup on CPUs for sparse-quantized LLaMA models of up to 8.6x.
arXiv Detail & Related papers (2024-05-06T16:03:32Z)
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity [12.663030430488922]
We propose Flash-LLM for enabling low-cost and highly-efficient large generative model inference on high-performance Cores. At SpMM kernel level, Flash-LLM significantly outperforms the state-of-the-art library, i.e., Sputnik and SparTA by an average of 2.9x and 1.5x, respectively.
arXiv Detail & Related papers (2023-09-19T03:20:02Z)
Large-Batch, Iteration-Efficient Neural Bayesian Design Optimization [37.339567743948955]
We present a novel Bayesian optimization framework specifically tailored to address the limitations of BO. Our key contribution is a highly scalable, sample-based acquisition function that performs a non-dominated sorting of objectives. We show that our acquisition function in combination with different Bayesian neural network surrogates is effective in data-intensive environments with a minimal number of iterations.
arXiv Detail & Related papers (2023-06-01T19:10:57Z)
Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX [5.856836693166898]
We use the Python library JAX to implement value iteration and simulators of the underlying Markov decision processes in a high-level API. Our method can extend use of value iteration to settings that were previously considered infeasible or impractical. We compare the performance of the optimal replenishment policies to policies, fitted using simulation optimization in JAX which allowed the parallel evaluation of multiple candidate policy parameters.
arXiv Detail & Related papers (2023-03-19T14:20:44Z)
Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation [35.765304656180355]
We study efficient architecture design for real-time multi-person pose estimation on edge. Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation. We introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs.
arXiv Detail & Related papers (2022-05-03T02:08:04Z)
MegBA: A High-Performance and Distributed Library for Large-Scale Bundle Adjustment [4.719974460724886]
MegBA is a high-performance and distributed library for large-scale Bundle Adjustment. It can out-perform state-of-the-art BA libraries by up to 33x and 3.3x respectively in public large-scale BA benchmarks.
arXiv Detail & Related papers (2021-12-02T15:50:18Z)
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
MPLP++: Fast, Parallel Dual Block-Coordinate Ascent for Dense Graphical Models [96.1052289276254]
This work introduces a new MAP-solver, based on the popular Dual Block-Coordinate Ascent principle. Surprisingly, by making a small change to the low-performing solver, we derive the new solver MPLP++ that significantly outperforms all existing solvers by a large margin.
arXiv Detail & Related papers (2020-04-16T16:20:53Z)
Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models. This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.