PCA-RAG: Principal Component Analysis for Efficient Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2504.08386v1
- Date: Fri, 11 Apr 2025 09:38:12 GMT
- Title: PCA-RAG: Principal Component Analysis for Efficient Retrieval-Augmented Generation
- Authors: Arman Khaledian, Amirreza Ghadiridehkordi, Nariman Khaledian,
- Abstract summary: High-dimensional language model embeddings can present scalability challenges in terms of storage and latency.<n>This paper investigates the use of Principal Component Analysis (PCA) to reduce embedding dimensionality.<n>We show that PCA-based compression offers a viable balance between retrieval fidelity and resource efficiency.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for grounding large language models in external knowledge sources, improving the precision of agents responses. However, high-dimensional language model embeddings, often in the range of hundreds to thousands of dimensions, can present scalability challenges in terms of storage and latency, especially when processing massive financial text corpora. This paper investigates the use of Principal Component Analysis (PCA) to reduce embedding dimensionality, thereby mitigating computational bottlenecks without incurring large accuracy losses. We experiment with a real-world dataset and compare different similarity and distance metrics under both full-dimensional and PCA-compressed embeddings. Our results show that reducing vectors from 3,072 to 110 dimensions provides a sizeable (up to $60\times$) speedup in retrieval operations and a $\sim 28.6\times$ reduction in index size, with only moderate declines in correlation metrics relative to human-annotated similarity scores. These findings demonstrate that PCA-based compression offers a viable balance between retrieval fidelity and resource efficiency, essential for real-time systems such as Zanista AI's \textit{Newswitch} platform. Ultimately, our study underscores the practicality of leveraging classical dimensionality reduction techniques to scale RAG architectures for knowledge-intensive applications in finance and trading, where speed, memory efficiency, and accuracy must jointly be optimized.
Related papers
- ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity.
This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics.
Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z) - Value-Based Deep RL Scales Predictably [100.21834069400023]
We show that value-based off-policy RL methods are predictable despite community lore regarding their pathological behavior.<n>We validate our approach using three algorithms: SAC, BRO, and PQL on DeepMind Control, OpenAI gym, and IsaacGym.
arXiv Detail & Related papers (2025-02-06T18:59:47Z) - Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval [1.6874375111244329]
State-of-the-art image retrieval systems train specific neural networks for each dataset.<n>Off-the-shelf foundation models fall short in achieving performance comparable to dataset-specific models.<n>We introduce Autoencoders with Strong Variance Constraints (AE-SVC), which significantly improves the performance of foundation models.
arXiv Detail & Related papers (2024-10-09T16:05:16Z) - SCORE: A 1D Reparameterization Technique to Break Bayesian Optimization's Curse of Dimensionality [0.0]
A 1D reparametrization trick is proposed to break this curse and sustain linear time complexity for BO in high-dimensional landscapes.
This fast and scalable approach named SCORE can successfully find the global minimum of needle-in-a-haystack optimization functions.
arXiv Detail & Related papers (2024-06-18T14:28:29Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - Learning-Augmented K-Means Clustering Using Dimensional Reduction [1.7243216387069678]
We propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA)
PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization.
arXiv Detail & Related papers (2024-01-06T12:02:33Z) - Robust Principal Component Analysis using Density Power Divergence [8.057006406834466]
We introduce a novel robust PCA estimator based on the minimum density power divergence estimator.
Our theoretical findings are supported by extensive simulations and comparisons with existing robust PCA methods.
arXiv Detail & Related papers (2023-09-24T02:59:39Z) - Generalizable Mixed-Precision Quantization via Attribution Rank
Preservation [90.26603048354575]
We propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference.
Our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks.
arXiv Detail & Related papers (2021-08-05T16:41:57Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Recovery of Linear Components: Reduced Complexity Autoencoder Designs [0.951828574518325]
We present an approach called Recovery of Linear Components (RLC), which serves as a middle ground between linear and non-linear dimensionality reduction techniques.
With the aid of synthetic and real world case studies, we show that the RLC, when compared with an autoencoder of similar complexity, shows higher accuracy, similar to robustness to overfitting, and faster training times.
arXiv Detail & Related papers (2020-12-14T14:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.