Interactive Extractive Search over Biomedical Corpora
- URL: http://arxiv.org/abs/2006.04148v1
- Date: Sun, 7 Jun 2020 13:26:32 GMT
- Title: Interactive Extractive Search over Biomedical Corpora
- Authors: Hillel Taub-Tabib, Micah Shlain, Shoval Sadde, Dan Lahav, Matan Eyal,
Yaara Cohen, Yoav Goldberg
- Abstract summary: We present a system that allows life-science researchers to search a linguistically annotated corpus of texts.
We introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations.
Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine.
- Score: 41.72755714431404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a system that allows life-science researchers to search a
linguistically annotated corpus of scientific texts using patterns over
dependency graphs, as well as using patterns over token sequences and a
powerful variant of boolean keyword queries. In contrast to previous attempts
to dependency-based search, we introduce a light-weight query language that
does not require the user to know the details of the underlying linguistic
representations, and instead to query the corpus by providing an example
sentence coupled with simple markup. Search is performed at an interactive
speed due to efficient linguistic graph-indexing and retrieval engine. This
allows for rapid exploration, development and refinement of user queries. We
demonstrate the system using example workflows over two corpora: the PubMed
corpus including 14,446,243 PubMed abstracts and the CORD-19 dataset, a
collection of over 45,000 research papers focused on COVID-19 research. The
system is publicly available at https://allenai.github.io/spike
Related papers
- ELCC: the Emergent Language Corpus Collection [1.6574413179773761]
The Emergent Language Corpus Collection (ELCC) is a collection of corpora collected from open source implementations of emergent communication systems.
Each corpus is annotated with metadata describing the characteristics of the source system as well as a suite of analyses of the corpus.
arXiv Detail & Related papers (2024-07-04T21:23:18Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - SciLit: A Platform for Joint Scientific Literature Discovery,
Summarization and Citation Generation [11.186252009101077]
We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper.
SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system.
arXiv Detail & Related papers (2023-06-06T09:34:45Z) - Dense Sparse Retrieval: Using Sparse Language Models for Inference
Efficient Dense Retrieval [37.22592489907125]
We study how sparse language models can be used for dense retrieval to improve inference efficiency.
We find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds.
arXiv Detail & Related papers (2023-03-31T20:21:32Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks.
We first represent both natural language query texts and programming language code snippets with the unified graph-structured data.
In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z) - Syntactic Search by Example [44.69040040007045]
We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs.
We introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations.
Search is performed at an interactive speed due to an efficient linguistic graph-indexing and retrieval engine.
arXiv Detail & Related papers (2020-06-04T16:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.