3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency
- URL: http://arxiv.org/abs/2510.18905v2
- Date: Wed, 29 Oct 2025 17:57:23 GMT
- Title: 3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency
- Authors: Minseok Jung, Abhas Ricky, Muhammad Rameez Chatni,
- Abstract summary: We introduce a 3D optimization framework that jointly calibrates accuracy, cost, and latency within a unified decision space.<n>We show that knee-point optimization achieves the best balance, while accuracy-maximization remains favorable when precision is prioritized.
- Score: 1.376408511310322
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: AI inference scaling is often tuned through 1D heuristics (a fixed reasoning passes) or 2D bivariate trade-offs (e.g., performance vs. compute), which fail to consider cost and latency constraints. We introduce a 3D optimization framework that jointly calibrates accuracy, cost, and latency within a unified decision space, enabling constraints-aware inference scaling. Using Monte Carlo simulations across three representative scenarios and nine simulated large language models, we evaluate four optimization methods to address the 3D multi-objective optimization (MOO) problem. Framing inference scaling in MOO shapes a feasible space that 1D and 2D optimizations fail to capture, enabling environmentadaptive selection of the inference scaling k. Results show that knee-point optimization achieves the best balance, while accuracy-maximization remains favorable when precision is prioritized. The framework establishes a theoretical foundation for deployment-aware inference scaling across diverse operational contexts.
Related papers
- Towards Robust Scaling Laws for Optimizers [89.21160945066737]
Empirical scaling laws are widely used to predict loss as model size and training data grow.<n>We show that Chinchilla-style scaling laws emerge naturally as a result of loss decomposition into irreducible, approximation, and optimization errors.
arXiv Detail & Related papers (2026-02-07T21:40:33Z) - Tail-Aware Post-Training Quantization for 3D Geometry Models [58.79500829118265]
Post-Training Quantization (PTQ) enables efficient inference without retraining.<n>PTQ fails to transfer effectively to 3D models due to intricate feature distributions and prohibitive calibration overhead.<n>We propose TAPTQ, a Tail-Aware Post-Training Quantization pipeline for 3D geometric learning.
arXiv Detail & Related papers (2026-02-02T07:21:15Z) - ROOT: Robust Orthogonalized Optimizer for Neural Network Training [47.05662448082334]
Large language models (LLMs) remain a critical challenge, particularly as model scaling exacerbates sensitivity to imprecision and training instability.<n>We develop a dimension-robustization scheme that enhances robustness through iterations tailored to specific matrix sizes.<n>Second, we introduce an optimization-robustization framework that suppresses outliers noise while preserving meaningful directions.
arXiv Detail & Related papers (2025-11-25T18:48:05Z) - Learnable SMPLify: A Neural Solution for Optimization-Free Human Pose Inverse Kinematics [13.621560002904873]
Learnable SMPLify is a neural framework that replaces the iterative fitting process in SMPLify with a single-pass regression model.<n>It achieves nearly 200x faster runtime compared to SMPLify, generalizes well to unseen 3DPW and RICH, and operates as a model-agnostic manner when used as a plug-in tool on LucidAction.
arXiv Detail & Related papers (2025-08-19T06:53:57Z) - ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction [5.285847977231642]
3D semantic occupancy prediction is crucial for ensuring the safety in autonomous driving.
Existing fusion-based occupancy methods typically involve performing a 2D-to-3D view transformation on image features.
We propose OccLoff, a framework that Learns to optimize Feature Fusion for 3D occupancy prediction.
arXiv Detail & Related papers (2024-11-06T06:34:27Z) - Uncertainty-Aware Testing-Time Optimization for 3D Human Pose Estimation [65.91490997921859]
We propose an Uncertainty-Aware testing-time Optimization (UAO) framework for 3D human pose estimation.<n>The framework keeps the prior information of the pre-trained model and alleviates the overfitting problem using the uncertainty of joints.<n>Our approach outperforms the previous best result by a large margin of 5.5% on Human3.6M.
arXiv Detail & Related papers (2024-02-04T04:28:02Z) - iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching [14.737266480464156]
We present a method named iComMa to address the 6D camera pose estimation problem in computer vision.
We propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS)
arXiv Detail & Related papers (2023-12-14T15:31:33Z) - Camera Distortion-aware 3D Human Pose Estimation in Video with
Optimization-based Meta-Learning [23.200130129530653]
Existing 3D human pose estimation algorithms trained on distortion-free datasets suffer performance drop when applied to new scenarios with a specific camera distortion.
We propose a simple yet effective model for 3D human pose estimation in video that can quickly adapt to any distortion environment.
arXiv Detail & Related papers (2021-11-30T01:35:04Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.<n>We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.