Related papers: Syntactic Search by Example

Syntactic Search by Example

URL: http://arxiv.org/abs/2006.03010v1
Date: Thu, 4 Jun 2020 16:59:01 GMT
Title: Syntactic Search by Example
Authors: Micah Shlain, Hillel Taub-Tabib, Shoval Sadde, Yoav Goldberg
Abstract summary: We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. We introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations. Search is performed at an interactive speed due to an efficient linguistic graph-indexing and retrieval engine.
Score: 44.69040040007045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. In contrast to previous attempts to this effect, we introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to an efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of syntax-based queries. We demonstrate the system using queries over two corpora: the English wikipedia, and a collection of English pubmed abstracts. A demo of the wikipedia system is available at: https://allenai.github.io/spike

Related papers

A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models [0.0]
We present an automated pipeline for estimating Verb Frame Frequencies (VFFs)<n>VFFs provide a powerful window into syntax in both human and machine language systems.<n>We use large language models (LLMs) to generate a corpus of sentences containing 476 English verbs.
arXiv Detail & Related papers (2025-07-29T19:30:11Z)
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches [5.80278230280824]
We propose a novel algorithm that achieves semantic yet efficient pattern matching by relaxing a surface-level matching with word embeddings. Our experiments demonstrate that the proposed method can execute searches on billion-scale corpora in less than a second.
arXiv Detail & Related papers (2025-03-05T17:53:11Z)
QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval [12.543590253664492]
We present a novel, interactive system called $textitQueryBuilder$. It allows a novice, English-speaking user to create queries with a small amount of effort. It rapidly develops cross-lingual information retrieval queries corresponding to the user's information needs.
arXiv Detail & Related papers (2024-09-07T00:46:58Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
Visualizing Linguistic Diversity of Text Datasets Synthesized by Large Language Models [9.808214545408541]
LinguisticLens is a novel inter-active visualization tool for making sense of and analyzing syntactic diversity of datasets. It supports hierarchical visualization of a text dataset, allowing users to quickly scan for an overview and inspect individual examples.
arXiv Detail & Related papers (2023-05-19T00:53:45Z)
Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval [37.22592489907125]
We study how sparse language models can be used for dense retrieval to improve inference efficiency. We find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds.
arXiv Detail & Related papers (2023-03-31T20:21:32Z)
Semantic Parsing for Conversational Question Answering over Knowledge Graphs [63.939700311269156]
We develop a dataset where user questions are annotated with Sparql parses and system answers correspond to execution results thereof. We present two different semantic parsing approaches and highlight the challenges of the task. Our dataset and models are released at https://github.com/Edinburgh/SPICE.
arXiv Detail & Related papers (2023-01-28T14:45:11Z)
Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures. We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees. Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
Interactive Extractive Search over Biomedical Corpora [41.72755714431404]
We present a system that allows life-science researchers to search a linguistically annotated corpus of texts. We introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations. Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine.
arXiv Detail & Related papers (2020-06-07T13:26:32Z)
A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation [16.914116942666976]
We introduce a novel methodology to efficiently construct a corpus for question answering over structured data. In our method, we randomly generate OTs from a context-free grammar. We apply the method to create a new corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus.
arXiv Detail & Related papers (2020-04-16T12:50:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.