DS SERVE: A Framework for Efficient and Scalable Neural Retrieval
- URL: http://arxiv.org/abs/2602.22224v1
- Date: Wed, 17 Dec 2025 00:43:10 GMT
- Title: DS SERVE: A Framework for Efficient and Scalable Neural Retrieval
- Authors: Jinjian Liu, Yichuan Wang, Xinxi Lyu, Rulin Shao, Joseph E. Gonzalez, Matei Zaharia, Sewon Min,
- Abstract summary: We present DS-Serve, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system.<n> DS-Serve offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node.<n>We anticipate that DS-Serve will be broadly useful for a range of applications, including large-scale retrieval-augmented generation (RAG), training data attribution, training search agents, and beyond.
- Score: 59.295343280892524
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present DS-Serve, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system. DS-Serve offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time trade-offs between latency, accuracy, and result diversity. We anticipate that DS-Serve will be broadly useful for a range of applications, including large-scale retrieval-augmented generation (RAG), training data attribution, training search agents, and beyond.
Related papers
- A Co-Training Semi-Supervised Framework Using Faster R-CNN and YOLO Networks for Object Detection in Densely Packed Retail Images [1.0896567381206714]
This study proposes a semi-supervised co-training framework for object detection in densely packed retail environments.<n>The framework combines Faster R-CNN for precise localization with YOLO for global context.<n>It employs an ensemble of XGBoost, Random Forest, and SVM, utilizing diverse feature representations for higher robustness.
arXiv Detail & Related papers (2025-09-11T13:40:43Z) - fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence [55.582429009401956]
fVDB is a novel framework for deep learning on large-scale 3D data.<n>Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines.
arXiv Detail & Related papers (2024-07-01T20:20:33Z) - ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and
Multispectral Data Fusion [54.668445421149364]
Deep learning-based hyperspectral image (HSI) super-resolution aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs)
In this letter, we propose a novel adversarial automatic data augmentation framework ADASR that automatically optimize and augments HSI-MSI sample pairs to enrich data diversity for HSI-MSI fusion.
arXiv Detail & Related papers (2023-10-11T07:30:37Z) - OrcoDCS: An IoT-Edge Orchestrated Online Deep Compressed Sensing
Framework [31.95604675656826]
We propose OrcoDCS, an IoT-Edge orchestrated online deep compressed sensing framework.
OrcoDCS offers high flexibility and adaptability to distinct IoT device groups and their sensing tasks.
We show analytically and empirically that OrcoDCS outperforms the state-of-the-art DCDA on training time.
arXiv Detail & Related papers (2023-08-05T04:19:35Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.
Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure.
We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z) - Semi-supervised Network Embedding with Differentiable Deep Quantisation [81.49184987430333]
We develop d-SNEQ, a differentiable quantisation method for network embedding.
d-SNEQ incorporates a rank loss to equip the learned quantisation codes with rich high-order information.
It is able to substantially compress the size of trained embeddings, thus reducing storage footprint and accelerating retrieval speed.
arXiv Detail & Related papers (2021-08-20T11:53:05Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.