Related papers: Reranker Optimization via Geodesic Distances on k-NN Manifolds

Reranker Optimization via Geodesic Distances on k-NN Manifolds

URL: http://arxiv.org/abs/2602.15860v1
Date: Mon, 26 Jan 2026 07:55:08 GMT
Title: Reranker Optimization via Geodesic Distances on k-NN Manifolds
Authors: Wen G. Gong,
Abstract summary: We propose Maniscope, a geometric reranking method that computes geodesic distances on k-nearest neighbor (k-NN) equations.<n>Maniscope outperforms HNSW graph-based baseline on the three hardest datasets.<n>Compared to cross-encoder rerankers, Maniscope achieves within 2% accuracy at 10-45x lower latency.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current neural reranking approaches for retrieval-augmented generation (RAG) rely on cross-encoders or large language models (LLMs), requiring substantial computational resources and exhibiting latencies of 3-5 seconds per query. We propose Maniscope, a geometric reranking method that computes geodesic distances on k-nearest neighbor (k-NN) manifolds constructed over retrieved document candidates. This approach combines global cosine similarity with local manifold geometry to capture semantic structure that flat Euclidean metrics miss. Evaluating on eight BEIR benchmark datasets (1,233 queries), Maniscope outperforms HNSW graph-based baseline on the three hardest datasets (NFCorpus: +7.0%, TREC-COVID: +1.6%, AorB: +2.8% NDCG@3) while being 3.2x faster (4.7 ms vs 14.8 ms average). Compared to cross-encoder rerankers, Maniscope achieves within 2% accuracy at 10-45x lower latency. On TREC-COVID, LLM-Reranker provides only +0.5% NDCG@3 improvement over Maniscope at 840x higher latency, positioning Maniscope as a practical alternative for real-time RAG deployment. The method requires O(N D + M^2 D + M k log k) complexity where M << N , enabling sub-10 ms latency. We plan to release Maniscope as open-source software.

Related papers

AQR-HNSW: Accelerating Approximate Nearest Neighbor Search via Density-aware Quantization and Multi-stage Re-ranking [1.2690814190593385]
This paper presents Adaptive Quantization and Rerank HNSW, a novel framework that integrates three strategies to enhance HNSW scalability.<n>AQR-HNSW introduces (1) density-aware adaptive quantization, achieving 4x compression while preserving distance relationships; (2) multi-state re-ranking that reduces unnecessary computations by 35%; and (3) quantization-optimized SIMD implementations delivering 16-64 operations per cycle across architectures.<n> Evaluation on standard benchmarks demonstrates 2.5-3.3x higher queries per second (QPS) than state-of-the-art HNSW implementations while maintaining over 98% recall, with 75% memory reduction for the index graph and
arXiv Detail & Related papers (2026-02-25T05:58:16Z)
HARP-NeXt: High-Speed and Accurate Range-Point Fusion Network for 3D LiDAR Semantic Segmentation [39.58684038370709]
LiDAR semantic segmentation is crucial for autonomous vehicles and mobile robots.<n>Previous state-of-the-art methods often face a trade-off between accuracy and speed.<n>We introduce HARP-NeXt, a high-speed and accurate LiDAR semantic segmentation network.
arXiv Detail & Related papers (2025-10-08T10:46:07Z)
Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems [3.215065407261898]
Multi-agent systems that combine large language models with external tools are rapidly transitioning from research laboratories into high-stakes domains.<n>This "Advanced" sequel fills that gap by providing an algorithmic instantiation or empirical evidence.<n>AMDM cuts anomaly-detection latency from 12.3 s to 5.6 s on simulated goal drift and reduces false-positive rates from 4.5% to 0.9%.
arXiv Detail & Related papers (2025-08-28T15:52:49Z)
S*: Test Time Scaling for Code Generation [55.11863577956177]
We propose S*, the first hybrid test-time scaling framework for code generation.<n>S* substantially improves the coverage and selection accuracy of generated code.
arXiv Detail & Related papers (2025-02-20T09:18:53Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
rule4ml: An Open-Source Tool for Resource Utilization and Latency Estimation for ML Models on FPGA [0.0]
This paper introduces a novel method to predict the resource utilization and inference latency of Neural Networks (NNs) before their synthesis and implementation on FPGA. We leverage HLS4ML, a tool-flow that helps translate NNs into high-level synthesis (HLS) code. Our method uses trained regression models for immediate pre-synthesis predictions.
arXiv Detail & Related papers (2024-08-09T19:35:10Z)
Hybrid-Task Meta-Learning: A GNN Approach for Scalable and Transferable Bandwidth Allocation [50.96751567777229]
We develop a deep learning-based bandwidth allocation policy that is scalable with the number of users and transferable to different communication scenarios.<n>To support scalability, the bandwidth allocation policy is represented by a graph neural network (GNN)<n>We develop a hybrid-task meta-learning (HML) algorithm that trains the initial parameters of the GNN with different communication scenarios.
arXiv Detail & Related papers (2023-12-23T04:25:12Z)
Flexible Channel Dimensions for Differentiable Architecture Search [50.33956216274694]
We propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency.
arXiv Detail & Related papers (2023-06-13T15:21:38Z)
Back to MLP: A Simple Baseline for Human Motion Prediction [59.18776744541904]
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. We show that the performance of these approaches can be surpassed by a light-weight and purely architectural architecture with only 0.14M parameters. An exhaustive evaluation on Human3.6M, AMASS and 3DPW datasets shows that our method, which we dub siMLPe, consistently outperforms all other approaches.
arXiv Detail & Related papers (2022-07-04T16:35:58Z)
Res-GCNN: A Lightweight Residual Graph Convolutional Neural Networks for Human Trajectory Forecasting [0.0]
We propose a Residual Graph Convolutional Neural Network (Res-GCNN), which models the interactive behaviors of pedes-trians. Results show an improvement over the state of art by 13.3% on the Final Displacement Error (FDE) which reaches 0.65 meter. The code will be made publicly available on GitHub.
arXiv Detail & Related papers (2020-11-18T11:18:16Z)
Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation [91.12575065731883]
We propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS) The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $ell_n$-norm loss and IoU-based loss. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR.
arXiv Detail & Related papers (2020-05-07T16:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.