Syntactic Search by Example
- URL: http://arxiv.org/abs/2006.03010v1
- Date: Thu, 4 Jun 2020 16:59:01 GMT
- Title: Syntactic Search by Example
- Authors: Micah Shlain, Hillel Taub-Tabib, Shoval Sadde, Yoav Goldberg
- Abstract summary: We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs.
We introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations.
Search is performed at an interactive speed due to an efficient linguistic graph-indexing and retrieval engine.
- Score: 44.69040040007045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a system that allows a user to search a large linguistically
annotated corpus using syntactic patterns over dependency graphs. In contrast
to previous attempts to this effect, we introduce a light-weight query language
that does not require the user to know the details of the underlying syntactic
representations, and instead to query the corpus by providing an example
sentence coupled with simple markup. Search is performed at an interactive
speed due to an efficient linguistic graph-indexing and retrieval engine. This
allows for rapid exploration, development and refinement of syntax-based
queries. We demonstrate the system using queries over two corpora: the English
wikipedia, and a collection of English pubmed abstracts. A demo of the
wikipedia system is available at: https://allenai.github.io/spike
Related papers
- QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval [12.543590253664492]
We present a novel, interactive system called $textitQueryBuilder$.
It allows a novice, English-speaking user to create queries with a small amount of effort.
It rapidly develops cross-lingual information retrieval queries corresponding to the user's information needs.
arXiv Detail & Related papers (2024-09-07T00:46:58Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Visualizing Linguistic Diversity of Text Datasets Synthesized by Large
Language Models [9.808214545408541]
LinguisticLens is a novel inter-active visualization tool for making sense of and analyzing syntactic diversity of datasets.
It supports hierarchical visualization of a text dataset, allowing users to quickly scan for an overview and inspect individual examples.
arXiv Detail & Related papers (2023-05-19T00:53:45Z) - Dense Sparse Retrieval: Using Sparse Language Models for Inference
Efficient Dense Retrieval [37.22592489907125]
We study how sparse language models can be used for dense retrieval to improve inference efficiency.
We find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds.
arXiv Detail & Related papers (2023-03-31T20:21:32Z) - Semantic Parsing for Conversational Question Answering over Knowledge
Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof.
We present two different semantic parsing approaches and highlight the challenges of the task.
Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Interactive Extractive Search over Biomedical Corpora [41.72755714431404]
We present a system that allows life-science researchers to search a linguistically annotated corpus of texts.
We introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations.
Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine.
arXiv Detail & Related papers (2020-06-07T13:26:32Z) - A Methodology for Creating Question Answering Corpora Using Inverse Data
Annotation [16.914116942666976]
We introduce a novel methodology to efficiently construct a corpus for question answering over structured data.
In our method, we randomly generate OTs from a context-free grammar.
We apply the method to create a new corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus.
arXiv Detail & Related papers (2020-04-16T12:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.