Related papers: Interactive Extractive Search over Biomedical Corpora

Interactive Extractive Search over Biomedical Corpora

URL: http://arxiv.org/abs/2006.04148v1
Date: Sun, 7 Jun 2020 13:26:32 GMT
Title: Interactive Extractive Search over Biomedical Corpora
Authors: Hillel Taub-Tabib, Micah Shlain, Shoval Sadde, Dan Lahav, Matan Eyal, Yaara Cohen, Yoav Goldberg
Abstract summary: We present a system that allows life-science researchers to search a linguistically annotated corpus of texts. We introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations. Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine.
Score: 41.72755714431404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of user queries. We demonstrate the system using example workflows over two corpora: the PubMed corpus including 14,446,243 PubMed abstracts and the CORD-19 dataset, a collection of over 45,000 research papers focused on COVID-19 research. The system is publicly available at https://allenai.github.io/spike

Related papers

Intelligent Scientific Literature Explorer using Machine Learning (ISLE) [0.797970449705065]
This paper presents an integrated system for scientific literature exploration that combines large-scale data acquisition, hybrid retrieval, semantic topic modeling, and heterogeneous knowledge graph construction.<n>The proposed framework contributes a foundation for AI-assisted scientific discovery.
arXiv Detail & Related papers (2025-12-14T16:54:24Z)
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches [5.80278230280824]
We propose a novel algorithm that achieves semantic yet efficient pattern matching by relaxing a surface-level matching with word embeddings. Our experiments demonstrate that the proposed method can execute searches on billion-scale corpora in less than a second.
arXiv Detail & Related papers (2025-03-05T17:53:11Z)
ClusterTalk: Corpus Exploration Framework using Multi-Dimensional Exploratory Search [3.4123736336071864]
ClusterTalk is a framework for corpus exploration using multi-dimensional exploratory search. Our system integrates document clustering with faceted search, allowing users to interactively refine their exploration and ask corpus and document-level queries.
arXiv Detail & Related papers (2024-12-19T05:11:16Z)
Ranking Narrative Query Graphs for Biomedical Document Retrieval (Technical Report) [7.527096697768715]
This paper extends our existing graph-based discovery system for the biomedical domain. It contributes effective graph-based unsupervised ranking methods, a new query relaxation paradigm, and ontological rewriting.
arXiv Detail & Related papers (2024-12-06T12:49:28Z)
ELCC: the Emergent Language Corpus Collection [1.6574413179773761]
The Emergent Language Corpus Collection (ELCC) is a collection of corpora collected from open source implementations of emergent communication systems. Each corpus is annotated with metadata describing the characteristics of the source system as well as a suite of analyses of the corpus.
arXiv Detail & Related papers (2024-07-04T21:23:18Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies. We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience. The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z)
SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation [11.186252009101077]
We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper. SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system.
arXiv Detail & Related papers (2023-06-06T09:34:45Z)
Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval [37.22592489907125]
We study how sparse language models can be used for dense retrieval to improve inference efficiency. We find that sparse language models can be used as direct replacements with little to no drop in accuracy and up to 4.3x improved inference speeds.
arXiv Detail & Related papers (2023-03-31T20:21:32Z)
Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms. Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time. Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z)
Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks. We first represent both natural language query texts and programming language code snippets with the unified graph-structured data. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z)
Syntactic Search by Example [44.69040040007045]
We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. We introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations. Search is performed at an interactive speed due to an efficient linguistic graph-indexing and retrieval engine.
arXiv Detail & Related papers (2020-06-04T16:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.