On the Scaling of Robustness and Effectiveness in Dense Retrieval
- URL: http://arxiv.org/abs/2505.24279v1
- Date: Fri, 30 May 2025 06:57:27 GMT
- Title: On the Scaling of Robustness and Effectiveness in Dense Retrieval
- Authors: Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng,
- Abstract summary: Robustness and effectiveness are critical aspects of developing dense retrieval models for real-world applications.<n>Recent work has addressed scaling laws of effectiveness in dense retrieval, revealing a power-law relationship between effectiveness and the size of models and data.<n>We find that robustness and effectiveness exhibit different scaling patterns, leading to significant resource costs when jointly improving both.
- Score: 111.58315434849047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness and Effectiveness are critical aspects of developing dense retrieval models for real-world applications. It is known that there is a trade-off between the two. Recent work has addressed scaling laws of effectiveness in dense retrieval, revealing a power-law relationship between effectiveness and the size of models and data. Does robustness follow scaling laws too? If so, can scaling improve both robustness and effectiveness together, or do they remain locked in a trade-off? To answer these questions, we conduct a comprehensive experimental study. We find that:(i) Robustness, including out-of-distribution and adversarial robustness, also follows a scaling law.(ii) Robustness and effectiveness exhibit different scaling patterns, leading to significant resource costs when jointly improving both. Given these findings, we shift to the third factor that affects model performance, namely the optimization strategy, beyond the model size and data size. We find that: (i) By fitting different optimization strategies, the joint performance of robustness and effectiveness traces out a Pareto frontier. (ii) When the optimization strategy strays from Pareto efficiency, the joint performance scales in a sub-optimal direction. (iii) By adjusting the optimization weights to fit the Pareto efficiency, we can achieve Pareto training, where the scaling of joint performance becomes most efficient. Even without requiring additional resources, Pareto training is comparable to the performance of scaling resources several times under optimization strategies that overly prioritize either robustness or effectiveness. Finally, we demonstrate that our findings can help deploy dense retrieval models in real-world applications that scale efficiently and are balanced for robustness and effectiveness.
Related papers
- Toward Efficient Agents: Memory, Tool learning, and Planning [96.93533945696156]
This paper investigates efficiency from three core components of agents: memory, tool learning, and planning, considering costs such as latency, tokens, steps, etc.
arXiv Detail & Related papers (2026-01-20T17:51:56Z) - Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression [53.39128997308138]
We introduce information capacity, a measure of model efficiency based on text compression performance.<n> Empirical evaluations on mainstream open-source models show that models of varying sizes within a series exhibit consistent information capacity.<n>A distinctive feature of information capacity is that it incorporates tokenizer efficiency, which affects both input and output token counts.
arXiv Detail & Related papers (2025-11-11T10:07:32Z) - WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking [60.35109192765302]
Information seeking is a core capability that enables autonomous reasoning and decision-making.<n>We propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories.<n>Our method consistently achieves improvements in both effectiveness and efficiency over strong baselines.
arXiv Detail & Related papers (2025-10-28T17:51:42Z) - How does the optimizer implicitly bias the model merging loss landscape? [66.96572894292895]
We show that a single quantity -- the effective noise scale -- unifies the impact of inference and data choices on model merging.<n>Across datasets, the effectiveness of merging success is a non-monotonic function of effective noise, with a distinct optimum.
arXiv Detail & Related papers (2025-10-06T10:56:41Z) - HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning [0.0]
HEFT is a novel hierarchical adaptation strategy that composes two distinct PEFT methods in a coarse-to-fine manner.<n>A model fine-tuned for only three epochs with our HEFT strategy achieves an accuracy of 85.17%, exceeding the performance of models trained for 20 epochs.
arXiv Detail & Related papers (2025-09-11T19:06:46Z) - Towards Better Correctness and Efficiency in Code Generation [47.06216040246783]
We propose an efficiency-oriented reinforcement learning framework guided by a novel performance reward.<n>Online exploration is most effective when starting from a high-correctness baseline.<n>Experiments show the effectiveness of the method, which improves code correctness by 10.18% and runtime efficiency by 7.75% on a 7B model.
arXiv Detail & Related papers (2025-08-24T16:47:19Z) - Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z) - Clustering-based Meta Bayesian Optimization with Theoretical Guarantee [9.821653903127107]
We propose a scalable and robust meta-BO method designed to address key challenges in heterogeneous and large-scale meta-tasks.<n>Our approach effectively partitions transferred meta-functions into highly homogeneous clusters, learns the geometry-based surrogate prototype, and adaptively synthesizes meta-priors during the online phase.
arXiv Detail & Related papers (2025-03-08T06:46:28Z) - Towards Fair Class-wise Robustness: Class Optimal Distribution Adversarial Training [1.5565181723989001]
Adversarial training has proven to be a highly effective method for improving the robustness of deep neural networks against adversarial attacks.<n>It has been observed to exhibit a limitation in terms of robust fairness, characterized by a significant disparity in robustness across different classes.<n>Recent efforts to mitigate this problem have turned to class-wise-weighted methods.<n>This paper proposes a novel min-max training framework, Class Optimal Distribution Adversarial Training.
arXiv Detail & Related papers (2025-01-08T14:19:03Z) - Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving.<n>Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods.<n>We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z) - Towards Resilient and Efficient LLMs: A Comparative Study of Efficiency, Performance, and Adversarial Robustness [0.0]
We investigate the trade-off between efficiency, performance, and adversarial robustness of Large Language Models (LLMs)
We conduct experiments on three prominent models with varying levels of complexity and efficiency -- Transformer++, Gated Linear Attention (GLA) Transformer, and MatMul-Free LM.
Our results show that while the GLA Transformer and MatMul-Free LM achieve slightly lower accuracy on GLUE tasks, they demonstrate higher efficiency and either superior or comparative robustness on AdvGLUE tasks.
arXiv Detail & Related papers (2024-08-08T16:54:40Z) - Deconstructing What Makes a Good Optimizer for Language Models [7.9224468703944115]
We compare several optimization algorithms, including SGD, Adafactor, Adam, Lion, and Sophia.<n>No single algorithm emerged as a clear winner in terms of performance or stability to hyperparameter misspecification.
arXiv Detail & Related papers (2024-07-10T18:11:40Z) - Effective Interplay between Sparsity and Quantization: From Theory to Practice [33.697590845745815]
We show how sparsity and quantization interact when combined together.<n>We show that even if applied in the correct order, the compounded errors from sparsity and quantization can significantly harm accuracy.<n>Our findings extend to the efficient deployment of large models in resource-constrained compute platforms.
arXiv Detail & Related papers (2024-05-31T15:34:13Z) - Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks [6.596361762662328]
Internal structure and operation mechanism of large-scale language models are analyzed theoretically.
We evaluate the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies.
arXiv Detail & Related papers (2024-05-20T00:10:00Z) - Scaling Laws for Sparsely-Connected Foundation Models [70.41266138010657]
We explore the impact of parameter sparsity on the scaling behavior of Transformers trained on massive datasets.
We identify the first scaling law describing the relationship between weight sparsity, number of non-zero parameters, and amount of training data.
arXiv Detail & Related papers (2023-09-15T16:29:27Z) - FedDUAP: Federated Learning with Dynamic Update and Adaptive Pruning
Using Shared Data on the Server [64.94942635929284]
Federated Learning (FL) suffers from two critical challenges, i.e., limited computational resources and low training efficiency.
We propose a novel FL framework, FedDUAP, to exploit the insensitive data on the server and the decentralized data in edge devices.
By integrating the two original techniques together, our proposed FL model, FedDUAP, significantly outperforms baseline approaches in terms of accuracy (up to 4.8% higher), efficiency (up to 2.8 times faster), and computational cost (up to 61.9% smaller)
arXiv Detail & Related papers (2022-04-25T10:00:00Z) - Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by
Enabling Input-Adaptive Inference [119.19779637025444]
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images)
This paper studies multi-exit networks associated with input-adaptive inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency.
arXiv Detail & Related papers (2020-02-24T00:40:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.