Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
- URL: http://arxiv.org/abs/2302.14534v2
- Date: Sun, 24 Mar 2024 14:34:53 GMT
- Title: Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
- Authors: Christopher Akiki, Odunayo Ogundepo, Aleksandra Piktus, Xinyu Zhang, Akintunde Oladipo, Jimmy Lin, Martin Potthast,
- Abstract summary: Spacerini is a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face.
Spacerini makes state-of-the-art sparse and dense retrieval models more accessible to non-IR practitioners.
- Score: 104.2943594704532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Spacerini, a tool that integrates the Pyserini toolkit for reproducible information retrieval research with Hugging Face to enable the seamless construction and deployment of interactive search engines. Spacerini makes state-of-the-art sparse and dense retrieval models more accessible to non-IR practitioners while minimizing deployment effort. This is useful for NLP researchers who want to better understand and validate their research by performing qualitative analyses of training corpora, for IR researchers who want to demonstrate new retrieval models integrated into the growing Pyserini ecosystem, and for third parties reproducing the work of other researchers. Spacerini is open source and includes utilities for loading, preprocessing, indexing, and deploying search engines locally and remotely. We demonstrate a portfolio of 13 search engines created with Spacerini for different use cases.
Related papers
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework [73.91207117772291]
ManuSearch is a transparent and modular multi-agent framework designed to democratize deep search for large language models (LLMs)<n>ManuSearch decomposes the search and reasoning process into three collaborative agents: (1) a solution planning agent that iteratively formulates sub-queries, (2) an Internet search agent that retrieves relevant documents via real-time web search, and (3) a structured webpage reading agent that extracts key evidence from raw web content.
arXiv Detail & Related papers (2025-05-23T17:02:02Z) - Exploring new Approaches for Information Retrieval through Natural Language Processing [0.0]
This review paper explores recent advancements and emerging approaches in Information Retrieval (IR) applied to Natural Language Processing (NLP)<n>We examine traditional IR models such as Boolean, vector space, probabilistic, and inference network models, and highlight modern techniques including deep learning, reinforcement learning, and pretrained transformer models like BERT.<n>A comparative analysis of sparse, dense, and hybrid retrieval methods is presented, along with applications in web search engines, cross-language IR, argument mining, private information retrieval, and hate speech detection.
arXiv Detail & Related papers (2025-05-04T17:37:26Z) - OpenResearcher: Unleashing AI for Accelerated Scientific Research [35.31092912532057]
We introduce OpenResearcher, an innovative platform that leverages Artificial Intelligence (AI) techniques to accelerate the research process.
OpenResearcher is built based on Retrieval-Augmented Generation (RAG) to integrate Large Language Models (LLMs) with up-to-date, domain-specific knowledge.
We develop various tools for OpenResearcher to understand researchers' queries, search from the scientific literature, filter retrieved information, provide accurate and comprehensive answers, and self-refine these answers.
arXiv Detail & Related papers (2024-08-13T14:59:44Z) - The Use of Generative Search Engines for Knowledge Work and Complex Tasks [26.583783763090732]
We analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search.
Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine.
arXiv Detail & Related papers (2024-03-19T18:17:46Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - CrossBeam: Learning to Search in Bottom-Up Program Synthesis [51.37514793318815]
We propose training a neural model to learn a hands-on search policy for bottom-up synthesis.
Our approach, called CrossBeam, uses the neural model to choose how to combine previously-explored programs into new programs.
We observe that CrossBeam learns to search efficiently, exploring much smaller portions of the program space compared to the state-of-the-art.
arXiv Detail & Related papers (2022-03-20T04:41:05Z) - Searching the Search Space of Vision Transformer [98.96601221383209]
Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection.
We propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space.
We provide design guidelines of general vision transformers with extensive analysis according to the space searching process.
arXiv Detail & Related papers (2021-11-29T17:26:07Z) - Boosting Search Engines with Interactive Agents [25.89284695491093]
This paper presents first steps in designing agents that learn meta-strategies for contextual query refinements.
Agents are empowered with simple but effective search operators to exert fine-grained and transparent control over queries and search results.
arXiv Detail & Related papers (2021-09-01T13:11:57Z) - AutoSpace: Neural Architecture Search with Less Human Interference [84.42680793945007]
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.
We propose a novel differentiable evolutionary framework named AutoSpace, which evolves the search space to an optimal one.
With the learned search space, the performance of recent NAS algorithms can be improved significantly compared with using previously manually designed spaces.
arXiv Detail & Related papers (2021-03-22T13:28:56Z) - A New Neural Search and Insights Platform for Navigating and Organizing
AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature.
We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z) - Mapping Researchers with PeopleMap [11.466062262579495]
PeopleMap creates visual maps for researchers based on their research interests and publications.
Requiring only the researchers' Google Scholar profiles as input, PeopleMap generates and visualizes embeddings for the researchers.
PeopleMap has received positive feedback and enthusiasm for expanding its adoption across Georgia Tech.
arXiv Detail & Related papers (2020-08-31T20:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.