Unlocking Insights: Semantic Search in Jupyter Notebooks
- URL: http://arxiv.org/abs/2402.13234v1
- Date: Tue, 20 Feb 2024 18:49:41 GMT
- Title: Unlocking Insights: Semantic Search in Jupyter Notebooks
- Authors: Lan Li, Jinpeng Lv
- Abstract summary: We investigate the application of large language models to enhance semantic search capabilities.
Our objective is to retrieve generated outputs, such as figures or tables, associated functions and methods, and other pertinent information.
We demonstrate a semantic search framework that achieves a comprehensive semantic understanding of the entire notebook's contents.
- Score: 1.320904960556043
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Semantic search, a process aimed at delivering highly relevant search results
by comprehending the searcher's intent and the contextual meaning of terms
within a searchable dataspace, plays a pivotal role in information retrieval.
In this paper, we investigate the application of large language models to
enhance semantic search capabilities, specifically tailored for the domain of
Jupyter Notebooks. Our objective is to retrieve generated outputs, such as
figures or tables, associated functions and methods, and other pertinent
information.
We demonstrate a semantic search framework that achieves a comprehensive
semantic understanding of the entire notebook's contents, enabling it to
effectively handle various types of user queries. Key components of this
framework include:
1). A data preprocessor is designed to handle diverse types of cells within
Jupyter Notebooks, encompassing both markdown and code cells. 2). An innovative
methodology is devised to address token size limitations that arise with
code-type cells. We implement a finer-grained approach to data input,
transitioning from the cell level to the function level, effectively resolving
these issues.
Related papers
- Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment [0.0]
We propose a method to support metadata enrichment using topic annotations generated by three Large Language Models (LLMs): ChatGPT-3.5, GoogleBard, and GoogleGemini.
We evaluate the impact of contextual information (i.e., dataset description) on the classification outcomes.
arXiv Detail & Related papers (2024-03-01T10:01:36Z) - Language Models As Semantic Indexers [78.83425357657026]
We introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model.
We show the high quality of the learned IDs and demonstrate their effectiveness on three tasks including recommendation, product search, and document retrieval.
arXiv Detail & Related papers (2023-10-11T18:56:15Z) - NS3: Neuro-Symbolic Semantic Code Search [33.583344165521645]
We use a Neural Module Network architecture to implement this idea.
We compare our model - NS3 (Neuro-Symbolic Semantic Search) - to a number of baselines, including state-of-the-art semantic code retrieval methods.
We demonstrate that our approach results in more precise code retrieval, and we study the effectiveness of our modular design when handling compositional queries.
arXiv Detail & Related papers (2022-05-21T20:55:57Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Unbiased Sentence Encoder For Large-Scale Multi-lingual Search Engines [0.0]
We present a multi-lingual sentence encoder that can be used in search engines as a query and document encoder.
This embedding enables a semantic similarity score between queries and documents that can be an important feature in document ranking and relevancy.
arXiv Detail & Related papers (2021-03-01T07:19:16Z) - Intent Classification and Slot Filling for Privacy Policies [34.606121042708864]
PolicyIE is a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications.
We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence learning task.
arXiv Detail & Related papers (2021-01-01T00:44:41Z) - Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks.
We first represent both natural language query texts and programming language code snippets with the unified graph-structured data.
In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z) - Deep Search Query Intent Understanding [17.79430887321982]
This paper aims to provide a comprehensive learning framework for modeling query intent under different stages of a search.
We focus on the design for 1) predicting users' intents as they type in queries on-the-fly in typeahead search using character-level models; and 2) accurate word-level intent prediction models for complete queries.
arXiv Detail & Related papers (2020-08-15T18:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.